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(54) Altered thermostable DNA polymerases for sequencing 

(57) Modified thermostable DNA polymerases hav- 
ing enhanced efficiency for incorporating unconven- 
tional nucleotides such as those labeled with 
fluorescein family dyes are provided. Such modified 
thermostable DNA polymerases can be prepared by 
methods of the recombinant DNA technology. Recom- 
binant thermostable DNA polymerases prepared in 
accordance with the present invention are advanta- 
geous in many in vitro DNA synthesis applications, such = 
as DNA sequencing, synthesis of labeled DNA and the | 
production of labeled primer extension products. The 
said polymerase enzymes are particularly useful in 
chain termination nucleic acid sequencing methods. 
Nucleic acids encoding the recombinant thermostable 
DNA polymerases of the present invention, as well as 
vectors and host cells comprising same are also pro- 
vided. Kits comprising the recombinant thermostable 
DNA polymerases of the present invention are provided 
too. The polymerases and methods of the present 
invention provide significant cost and efficiency advan- 
tages for DNA sequencing. 
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s [0001 ] The present invention relates to thermostable DNA polymerases which have enhanced efficiency for incorpo- 
rating nucleoside triphosphates labeled with fluorescein family dyes. The present invention provides means for isolating 
and producing such altered polymerases. The enzymes of the invention are useful for many applications in molecular 
biology and are particularly advantageous for nucleic acid sequencing. 

10 Background of the Invention 

[0002] Incorporation of nucleoside triphosphates (dNTPs) labeled with fluorescent dyes is important for many in vitro 
DNA synthesis applications. For example, dye-terminator DNA sequencing reactions require the incorporation of fluo- 
rescent dideoxynucleotide analogues for termination and labeling. In addition, in vitro synthesis of labeled products 
may involve incorporation of fluorescent nucleotides or nucleotide analogues. For example, fluorescently labeled DNA 
has been used in hybridization assays using microarrays of immobilized probes (Cronin etaL, 1996, Human Mutation 
7*244) 

[0003] To assure fidelity of DNA replication, DNA polymerases have a very strong bias for incorporation of their nor- 
mal substrates referred to herein as conventional deoxynucleoside triphosphates (dNTPs), and against incorporation 
of unconventional dNTPs including dNTPs and dNTP analogues labeled with fluorescent dyes. In the cell, this property 
attenuates the incorporation of abnormal bases such as dUTP in a growing DNA strand. In vitro, this characteristic is 
particularly evident where both conventional and unconventional fluorescently-labeled nucleoside triphosphates are 
present, such as in DNA sequencing reactions using a version of the dideoxy chain termination method that utilizes dye- 
terminators (Lee et a/..1992. Nua Acids. Res. 20:2471 which is incorporated herein by reference). 
[0004] Commercially available DNA cycle sequencing kits for dye-terminator methods use chain terminator ddNTPs 
labeled with fluorescent dyes of the rhodamine family. However, rhodamine dyes are zwitterionic in charge and nucleo- 
side triphosphates labeled with these dyes migrate anomalously in the electrophoretic gels used to separate the 
sequencing products for detection. This property of rhodamine family dyes necessitates making modifications in the 
standard sequencing protocol which include the use of dITP and an additional processing step before electrophoresis. 
30 [0005] In contrast, negatively charged fluorescent dyes such as fluorescein family dyes allow 1) better separation 
between the labeled nucleoside triphosphates and labeled primer extension products, and 2) better electrophoretic 
migration of the labeled sequencing products than neutral or positively charged fluorescent dyes. Thus, the use of flu- 
orescein family dyes avoids the need for additional processing steps required with the use of rhodamine family dyes. 
However available dyes of the fluorescein family are not ideal for use in current commercially available DNA cycle 
35 sequencing formats because ddNTPs labeled with these dyes are not efficiently incorporated into sequencing products 
using these formats. Consequently, there is a need for commercially available thermostable DNA polymerases that can 
efficiently incorporate both conventional and f luorescein-labeled nucleotides. The present invention serves to meet that 
need Further, an unexpected property of the mutant enzymes of this invention is the increased rate of primer extension 
relative to the corresponding wild-type enzyme. Another unexpected property is the increased uniformity of incorpora- 
40 tion of the various terminator nucleotides in automated DNA sequence analysis. 

Summary of the Invention 

[0006] The present invention provides template-dependent thermostable DNA polymerase enzymes having reduced 
discrimination against incorporation of nucleotides labeled with fluorescein family dyes compared to previously charac- 
terized enzymes. These enzymes incorporate nucleotides, including deoxynucleotides (dNTPs) and base analogues 
such as dideoxynucleotides (ddNTPs). that are labeled with fluorescein family dyes more efficiently than conventional 
thermostable enzymes. Genes encoding these enzymes are also provided by the present invention, as are recombinant 
expression vectors for providing large amounts of purified enzymes. 

[0007] By the present invention, a region of criticality within thermostable DNA polymerases is identified which affects 
the polymerase's ability to incorporate nucleotides labeled with fluorescein family dyes, while retaining the ability to 
incorporate faithfully natural nucleotides. This region of criticality, or Critical Motif, can be introduced into genes for ther- 
mostable DNA polymerases by recombinant DNA methods such as site-specific mutagenesis to provide the advan- 
tages of the invention. 

[0008] Thus in one aspect, the invention provides recombinant thermostable DNA polymerase enzymes which are 
characterized in that the enzymes have been mutated to produce the Critical Motif and have reduced discrimination 
against incorporation of nucleotides labeled with fluorescein family dyes, in comparison to the corresponding wild-type 
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[0009] In this aspect, the invention provides recombinant thermostable DNA polymerase enzymes which are charac- 
terized in that a) in its native form said polymerase comprises the amino acid sequence (given in one-letter code) 
LSXXLX(V/I)PXXE (SEQ ID NO: 1), where X is any amino acid; b) the X at position 4 in said sequence is mutated in 
comparison to said native sequence, except that X is not mutated to E; and c) said thermostable DNA polymerase has 
5 reduced discrimination against incorporation of nucleotides labeled with fluorescein family dyes in comparison to the 
native form of said enzyme. In the three-letter code, this amino acid sequence is represented as LeuSerXaaXaaLeuX- 
aaXaaProXaaXaaGlu (SEQ ID NO: 1), whereby "Xaa" at positions 3, 4, 6. 9, and 10 of this sequence are any amino 
acid residue, and "Xaa" at position 7 of this sequence is Val or He. 

[0010] In another embodiment, the recombinant thermostable DNA polymerases are characterized in that a) the 

10 native form of the polymerase comprises the amino acid sequence LS(Q/G)XL(S/A)IPYEE (SEQ ID NO: 2), where X is 
any amino acid; b) the X at position 4 in said sequence is mutated in comparison to said native sequence, except that 
X is not mutated to E; and c) said thermostable DNA polymerase has reduced discrimination against incorporation of 
nucleotides labeled with fluorescein family dyes in comparison to the native form of said enzyme. In the three-letter 
code, this amino acid sequence is represented as LeuSerXaaXaaLeuXaalleProTyrGluGlu (SEQ ID NO: 2), whereby 

is "Xaa" at position 3 is Gin or Gly, "Xaa" at position 4 is any amino acid, and "Xaa" at position 6 is Ser or Ala. In a preferred 
embodiment, the amino acid sequence is LSQXLAIPYEE (SEQ ID NO:3), where X is any amino acid. In the three-letter 
code, this amino acid sequence is represented as LeuSerGlnXaaLeuAlalleProTyrGluGlu (SEQ ID NO:3), whereby 
"Xaa" at position 4 is any amino acid. In a more preferred embodiment, the "Xaa" at position 4 is Lys. 
[001 1 ] In yet another embodiment, the recombinant thermostable DNA polymerases are characterized in that a) the 

20 native form of the polymerase comprises the amino acid sequence LSVXLG(V/I)PVKE (SEQ ID NO: 4); b) the X at posi- 
tion 4 in said sequence is mutated in comparison to said native sequence, except that X is not mutated to E; and c) said 
thermostable DNA polymerase has reduced discrimination against incorporation of nucleotides labeled with fluorescein 
family dyes in comparison to the native form of said enzyme. In the three-letter code, this amino acid sequence is rep- 
resented as LeuSerValXaaLeuGlyXaaProValLysGlu (SEQ ID NO: 4), whereby "Xaa" at position 4 is any amino acid and 

25 "Xaa" at position 7 is Val or He. In a preferred embodiment, the amino acid sequence is LSVXLGVPVKE (SEQ ID NO: 
5) where X at position 4 is any amino acid. In the three-letter code, this amino acid sequence is represented as LeuS- 
erValXaaLeuGlyValProValLysGlu (SEQ ID NO: 5), whereby "Xaa" at position 4 is any amino acid. In a more preferred 
embodiment, the "Xaa" at position 4 is Arg. In another preferred embodiment, the amino acid sequence is LSVXL- 
GIPVKE (SEQ ID NO: 6) where X at position 4 is any amino acid. In the three-letter code, this amino acid sequence is 

30 represented as LeuSerValXaaLeuGlylleProValLysGlu (SEQ ID NO: 6), whereby "Xaa" at position 4 is any amino acid. 
In a more preferred embodiment, the "Xaa" at position 4 is Arg. 

[0012] In another aspect of this invention, the particular region of criticality of this invention can be combined with 
motifs in other regions of the polymerase gene that are known to provide thermostable DNA polymerases with reduced 
discrimination against incorporation of unconventional nucleotides such as rNTPs and ddNTPs. As exemplified herein, 

35 a recombinant Thermus aquaticus {Taq) DNA polymerase enzyme containing two mutations was constructed. The first 
mutation was an E to K mutation in the X residue at position 4 of the critical motif of this invention. The second mutation 
was a mutation allowing more efficient incorporation of ddNTPs known as the F667Y mutation. This mutation is a phe- 
nylalanine to tyrosine mutation at position 667 of Taq DNA polymerase (described in U.S. Patent No 5,61 4,365 and U.S. 
Serial No 8/448,223 and herein incorporated by reference). When used in a sequencing reation with fluorescein dye 

40 family-labeled ddNTPs, the E681 K F667Y double mutant enzyme was found to produce a readable sequencing ladder. 
Thus, in one embodiment, a motif conferring reduced discrimination toward dideoxynucleotides is combined with the 
critical motif of this invention to provide an enzyme having an increased efficiency of incorporation of both labeled and 
unlabeled ddNTPs. 

[0013] In addition, the E681K F667Y mutant enzyme was unexpectedly found to exhibit a significantly increased 
45 extension rate relative to an enzyme with the F667Y mutation alone. Thus, in another embodiment of the invention, 
introduction of the critical motif into a thermostable DNA polymerase enzyme, alone or in combination with other motifs, 
produces enzymes having an increased extension rate. The double mutant enzyme was also unexpectedly found to 
produce more uniform peak heights in dye-terminator dideoxy sequencing using rhodamine-labeled terminators. Thus, 
in yet another embodiment, introduction of the critical motif into a thermostable DNA polymerase enzyme produces 
so enzymes displaying more uniform peak heights in DNA sequencing methods using rhodamine dye family labeled ter- 
minators. 

[0014] In another embodiment, a mutation allowing more efficient incorporation of rNTPs, such as the glutamic acid 
to glycine mutation at position 615 of Taq DNA polymerase, or E615G mutation (described in European Patent Appli- 
cation, Publication No. EP-A-823 479, herein incorporated by reference), is combined with the critical motif of this inven- 
55 tion to provide an enzyme having an increased efficiency of incorporation of ribonucleotides labeled with fluorescein 
family dyes. 

[001 5] In another aspect of this invention, genes encoding the polymerases of this invention are also provided. Spe- 
cifically, genes encoding recombinant thermostable polymerases comprising the critical motif of this invention are pro- 
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vided. Also included in this aspect are genes encoding combinations of two or more mutations that include mutations 
producing the critical motif of this invention. 

[001 6] In yet another aspect, the invention also provides improved methods of DN A sequencing that allow the use of 
lower concentrations of fluorescein dye family-labeled ddNTPs, thereby reducing the cost of performing the reactions. 

5 The improved methods of the invention also allow the use of lower ratios of fluorescein dye family-labeled ddNTPs to 
dNTPs. Use of these methods results in numerous advantages, including more efficient polymerization, lower concen- 
trations of template nucleic acid being required, and a decreased likelihood of introducing inhibitors into the reaction 
mix. These advantages also facilitate the sequencing of long templates. The invention also provides improved methods 
of sequencing wherein sequencing reactions can be loaded directly onto sequencing gels for subsequent electrophore- 

10 sis without intermediate purification. 

[0017] Thus, in one embodiment of the invention, the invention provides improved methods for determining the 
sequence of a target nucleic acid using a recombinant enzyme which has a) a mutation at position 4 which produces 
the critical motif of this invention and b) has reduced discrimination against incorporation of nucleotides labeled with flu- 
orescein family dyes in comparison with the corresponding wild-type enzyme. Also within the scope of this invention are 

15 improved sequencing methods using thermostable DNA polymerase enzymes derived from thermophilic species, 
where the enzymes contain naturally occurring sequence variations that produce the critical motif of this invention. 
These native enzymes can also provide reduced discrimination against incorporation of unconventional nucleotides. In 
this embodiment, the invention provides improved methods of sequencing using a native thermostable DNA polymer- 
ase a) having the critical motif of this invention wherein the amino acid in position 4 is not Glu and b) having reduced 

20 discrimination against incorporation of nucleotides labeled with fluorescein family dyes. 

[0018] Also within the scope of this invention are improved methods of producing DNA labeled with fluorescein family 
dyes. The enzymes of the invention efficiently incorporate fluorescein-labeled dNTPs in a polymerase chain reaction 
method, producing amplified products that are labeled at various sites with fluorescein family dyes. Thus, in one embod- 
iment, an improved method of labeling DNA comprises a) providing a reaction mixture comprising dNTPs labeled with 

25 fluorescein family dyes and an enzyme of the invention and b) performing a nucleic acid amplification reaction. 

[001 9] The enzymes of the invention, and genes encoding these enzymes, provide additional aspects of the invention 
which are kits for DNA sequencing that comprise a recombinant enzyme of the invention and may additionally include 
a negatively charged fluorescent terminator compound. Other kits for DNA sequencing comprise a) a negatively 
charged fluorescent terminator compound and b) a native enzyme of the invention. 

30 [0020] The invention also provides kits for producing labeled DNA which comprise a recombinant enzyme of the 
invention. Other kits for producing labeled DNA comprise a) a negatively charged fluorescent nucleoside triphosphate 
compound and b) a native enzyme of the invention. 

Brief Description of the Drawing 

35 

[0021] 

Figure 1 is a schematic representation of the Taq DNA polymerase gene. Restriction sites are indicated that relate 
to Example I and the description of methods for preparing additional mutants and expression vectors provided 
40 herein. 

Detailed Description of the Invention 

[0022] To facilitate understanding of the invention, a number of terms are defined below. 

45 [0023] The term "gene" refers to a DNA sequence that comprises control and coding sequences necessary for the 
production of a recoverable bioactive polypeptide or precursor. The polypeptide can be encoded by a full-length gene 
sequence or by any portion of the coding sequence so long as the enzymatic activity is retained. 
[0024] The term "native" refers to a gene or gene product which is isolated from a naturally occurring source. This 
term also refers to a recombinant form of the native protein produced by molecular biological techniques which has an 

so amino acid sequence identical to that of the native form. 

[0025] The term "mutant" refers to a gene that has been altered in its nucleic acid sequence or a gene product which 
has been altered in its amino acid sequence, resulting in a gene product which may have altered functional properties 
when compared to the native or wild-type gene or gene product. Such alterations include point mutations, deletions and 
insertions. 

55 [0026] The term "host cell(s)" refers to both single-cellular prokaryote and eukaryote organisms such as bacteria, 
yeast, and actinomycetes and single cells from higher order plants or animals when being grown in cell culture. 
[0027] The term "expression system" refers to DNA sequences containing a desired coding sequence and control 
sequences in operable linkage, so that host cells transformed with these sequences are capable of producing the 
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encoded proteins. To effect transformation, the expression system may be included on a vector; however, the relevant 
DNA may also be integrated into the host chromosome. 

[0028] The term "oligonucleotide" as used herein is defined as a molecule comprised of two or more deoxyribonucle- 
otides or ribonucleotides, preferably more than three, and usually more than ten. The exact size of an oligonucleotide 

5 will depend on many factors, including the ultimate function or use of the oligonucleotide. 

[0029] Oligonucleotides can be prepared by any suitable method, including, for example, cloning and restriction of 
appropriate sequences and direct chemical synthesis by a method such as the phosphotriester method of Narang et 
a/., 1979. Meth. Enzvmol. 68:90-99; the phosphodi ester method of Brown era/., 1979. Meth. Enzvmol. 68:109-151; the 
diethylph'osphoramidite method of Beaucage era/., 1981 , Tetrahedron Lett. 22: 1859-1862; the triester method of Mat- 

10 teucciefa/., 1981 J. Am. Chem. Soc. 103 :3185-3191 or automated synthesis methods; and the solid support method 
of U.S. Patent No. 4,458,066, which publications are each incorporated herein by reference. 
[0030] The term "primer" as used herein refers to an oligonucleotide, whether natural or synthetic, which is capable 
of acting as a point of initiation of synthesis when placed under conditions in which primer extension is initiated. A 
primer is preferably a single-stranded oligodeoxyribonucleotide. The appropriate length of a primer depends on the 

is intended use of the primer but typically ranges from 15 to 35 nucleotides. Short primer molecules generally require 
cooler temperatures to form sufficiently stable hybrid complexes with the template. A primer need not reflect the exact 
sequence of the template but must be sufficiently complementary to hybridize with a template for primer elongation to 
occur. 

[0031 ] A primer can be labeled, if desired, by incorporating a label detectable by spectroscopic, photochemical, bio- 
20 chemical, immunochemical, or chemical means. For example, useful labels include 32 P, fluorescent dyes, electron- 
dense reagents, enzymes (as commonly used in ELISA assays), biotin, or haptens and proteins for which antisera or 
monoclonal antibodies are available. 

[0032] The term "thermostable polymerase," refers to an enzyme which is stable to heat, is heat resistant and retains 
sufficient activity to effect subsequent primer extension reactions and does not become irreversibly denatured (inacti- 

25 vated) when subjected to the elevated temperatures for the time necessary to effect denaturation of double-stranded 
nucleic acids. The heating conditions necessary for nucleic acid denaturation are well known in the art and are exem- 
plified in U.S. Patent Nos. 4,683,202 and 4,683,195, which are incorporated herein by reference. As used herein, a ther- 
mostable polymerase is suitable for use in a temperature cycling reaction such as the polymerase chain reaction 
("PCR"). Irreversible denaturation for purposes herein refers to permanent and complete loss of enzymatic activity. For 

30 a thermostable polymerase, enzymatic activity refers to the catalysis of the combination of the nucleotides in the proper 
manner to form primer extension products that are complementary to a template nucleic acid strand. 
[0033] The term "conventional" or "natural" when referring to nucleic acid bases, nucleoside triphosphates, or nucle- 
otides refers to those which occur naturally in the polynucleotide being described (i.e., for DNA these are dATP, dGTP, 
dCTP and dTTP). Additionally, dITP, and 7-deaza-dGTP are frequently utilized in place of dGTP and 7-deaza<JATP can 

35 be utilized in place of dATP in in vitro DNA synthesis reactions, such as sequencing. Collectively these may be referred 
to as dNTPs. 

[0034] The term "unconventional" or "modified" when referring to a nucleic acid base, nucleoside, or nucleotide 
includes modification, derivations, or analogues of conventional bases, nucleosides, or nucleotides that naturally occur 
in a particular polynucleotide. The deoxyribonucleotide form of uracil is an unconventional or modified base in DNA 

40 (dUMP), whereas, the ribonucleotide form of uracil is a conventional base in RNA (UMP). As used herein, unconven- 
tional nucleotides include but are not limited to compounds used as terminators for nucleic acid sequencing. Terminator 
compounds include but are not limited to those compounds which have a 2\3' dideoxy structure and are referred to as 
dideoxynucleoside triphosphates. The dideoxynucleoside triphosphates ddATP, ddTTP, ddCTP and ddGTP are referred 
to collectively as ddNTPs. Other unconventional nucleotides include phosphorothioate dNTPs ([cc-S]dNTPs), 5'-ct- 

45 borano-dNTPs. a-methyl-phosphonate dNTPs, and ribonucleoside triphosphates (rNTPs). Unconventional bases may 
be labeled with radioactive isotopes such as 32 P, 33 P, or 35 S; fluorescent labels; chemiluminescent labels; biolumines- 
cent labels; hapten labels such as biotin; or enzyme labels such as streptavidin or avidin. Fluorescent labels may 
include dyes that are negatively charged, such as dyes of the fluorescein family, or dyes that are neutral in charge, such 
as dyes of the rhodamine family, or dyes that are positively charged, such as dyes of the cyanine family. Dyes of the 

so fluorescein family include e.g., FAM, HEX, TET, JOE, NAN and ZOE. Dyes of the rhodamine family include Texas Red, 
ROX, R1 10, R6G, and TAMRA. FAM, HEX, TET, JOE, NAN, ZOE, ROX, R1 10, R6G, and TAMRA are marketed by Per- 
kin-Elmer (Foster City, CA), and Texas Red is marketed by Molecular Probes. Dyes of the cyanine family include Cy2, 
Cy3, Cy5, and Cy7 and are marketed by Amersham (Amersham Place, Little Chalfont, Buckinghamshire, England). 
[0035] The term "DNA synthesis reaction" refers to methods of producing copies of DNA including but not limited to 

55 PCR, strand displacement amplification, transcription mediated amplification, primer extension and reverse transcrip- 
tion. 

[0036] In order to further facilitate understanding of the invention, specific thermostable DNA polymerase enzymes 
and fluorescent dyes are referred to throughout the specification to exemplify the invention, and these references are 
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not intended to limit the scope of the invention. 

[0037] The present invention provides novel and improved compositions which are thermostable DNA polymerases. 
The enzymes of the invention include recombinant polymerases which more efficiently incorporate nucleoside triphos- 
phates labeled with fluorescein family dyes in comparison to the corresponding wild-type enzymes. The thermostable 
DNA polymerases of the invention are more suitable and desirable for use in processes such as DNA sequencing and 
in vitro synthesis of labeled products than prior art polymerases. Improved DNA sequencing methods of the invention 
include the use of these recombinant polymerases as well as the use of native enzymes which more efficiently incor- 
porate nucleoside triphosphates labeled with fluorescein family dyes than previously characterized enzymes. DNA 
sequences encoding these enzymes, and vectors for expressing the proteins are also provided. 
[0038] The thermostable DNA polymerases of the invention possess a region of criticality within the amino acid 
sequence of the polymerase activity domain of the enzyme. The critical region within the amino acid sequence of a ther- 
mostable DNA polymerase provided by the present invention is shown below using the conventional single-letter amino 
acid code (Lehninger, Biochemistry, New York New York, Worth Publishers Inc., 1970, page 67, which is incorporated 
herein by reference). 

SEQ ID NO: 7 LSXXLX(V/I)PXXE where the "X" at position 4 indicates any amino acid except E. In the three-letter code 
for amino acids, this sequence is represented as LeuSerXaaXaaLeuXaaXaaProXaaXaaGlu (SEQ ID NO: 7) whereby 
"Xaa" at positions 3, 6, 9, and 10 is any amino acid, "Xaa" at position 4 of this sequence is any amino acid but not a 
glutamic acid residue (Glu) and "Xaa" at position 7 is Val or lie. This region of criticality provides thermostable DNA 
polymerase enzymes characterized by the ability to efficiently incorporate nucleotides labeled with fluorescein family 
dyes. 

[0039] For example, in a derivative of the Thermus aquaticus ( Taq) DNA polymerase gene which already contains a 
glycine to aspartic acid mutation at position 46 (G46D) and an F667Y mutation, a mutation of G to A in the first position 
of the codon for glutamic acid at residue 681 sequence of the full length Taq DNA polymerase sequence (corresponding 
to position 4 of the critical motif) results in an enzyme having the critical motif. This enzyme displays 1) an approximately 
a 2- to 10-fold increase in the efficiency of incorporation of nucleotides labeled with fluorescein family dyes with no 
impairment of the enzyme's ability to mediate PCR in the presence of conventional nucleotides and 2) a 3 to 4.3-fold 
increase in the extension rate. In Taq DNA polymerase this particular mutation results in an amino acid change of E 
(glutamic acid) to K (lysine). 

[0040] Although this particular amino acid change produced the critical motif and significantly alters the ability of the 
enzyme to incorporate unconventional nucleotides, it is expected that the specific change of E to K is not as critical to 
the invention as is the now identified position within the region of criticality. Thus, in a preferred embodiment, the inven- 
tion provides recombinant thermostable DNA polymerase enzymes which are characterized in that a) in its native form 
said polymerase comprises the amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 1), where X is any amino acid; 
b) the X at position 4 in said sequence is mutated in comparison to said native sequence, except that the X at position 
4 is not mutated to E; and c) said thermostable DNA polymerase has reduced discrimination against incorporation of 
nucleotides labeled with fluorescein family dyes in comparison to the native form of said enzyme. In a more preferred 
embodiment, the X at position 4 is replaced by an amino acid having a positive charge, such as K, R or H, or by a polar 
amino acid such as Q or N. In a most preferred embodiment, the X at position 4 is replaced by K. 
[0041] In another preferred embodiment of the invention, the of the invention is characterized in that the enzyme (a) 
has reduced discrimination against fluorescein dye family labeled nucleotides and (b) comprises the amino acid 
sequence LS(Q/G)XL(S/A)lPYEE where X is any amino acid (SEQ ID NO: 2). In three-letter code, this amino acid 
sequence is represented as follows: LeuSerXaaXaaLeuXaalleProTyrGluGlu, whereby 'Xaa" at position 3 is Gin or Gly t 
"Xaa" at position 4 is any amino acid, and "Xaa" at position 6 is Ser or Ala. 

[0042] In a more preferred embodiment of the invention, the enzyme having reduced discrimination against fluores- 
cein dye family labeled nucleotides comprises the amino acid sequence LSQXLAIPYEE where X is any amino acid 
(SEQ ID NO: 3). In the three-letter code, this amino acid sequence is represented as LeuSerGlnXaaLeuAlalleProTyr- 
GluGlu, whereby "Xaa" at position 4 is any amino acid. In a most preferred embodiment of the invention, the X is a K 
residue. 

[0043] In another preferred embodiment of the invention, the the enzyme having reduced discrimination against fluo- 
rescein dye family labeled nucleotides comprises the amino acid sequence LSVXLG(V/I)PVKE where X is any amino 
acid (SEQ ID NO: 4). In the three-letter code, this amino acid sequence is represented as LeuSerValXaaLeuGiyXa- 
aProValLysGlu, whereby "Xaa" at position 4 is any amino acid and "Xaa" at position 7 is Val or lie. 
[0044] In a more preferred embodiment of the invention, the enzyme having reduced discrimination against fluores- 
cein dye family labeled nucleotides comprises the amino acid sequence LSVXLGVPVKE where X is any amino acid 
(SEQ ID NO: 5). In the three-letter code, this amino acid sequence is represented as LeuSerValXaaLeuGlyValProVal- 
LysGlu, whereby "Xaa" at position 4 is any amino acid. In a most preferred embodiment, the X is an R residue. 
[0045] In another more preferred embodiment, the enzyme having reduced discrimination against fluorescein dye 
family labeled nucleotides comprises the amino acid sequence LSVXLGIPVKE where X is any amino acid (SEQ ID NO: 
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6). In the three-letter code, this amino acid sequence is represented as LeuSerValXaaLeuGlylleProValLysGlu, whereby 
"Xaa" at position 4 is any amino acid. In a most preferred embodiment, the X is an R residue. 
[0046] The characterization of the E681 K mutation described herein identified a region in the DNA polymerase gene 
that affects the ability of the polymerase to interact with negatively charged fluorescent nucleotides. This site, distal to 

5 helix 0, is at the end of the O a helix and the beginning of the O b helix of the polymerase (Kim, et a/., 1995, Nature, 
376:612). Based on molecular modeling principles well-known in the art, changes in the structure of the O a -O b helix 
other than E to K at position 681 are also expected to produce changes in the ability of the polymerase to discriminate 
against nucleotides labeled with fluorescein family dyes. Thus, mutations at positions in the critical motif other than 
those in the X residue at position 4 are also within the scope of this invention. In this embodiment, the invention provides 

w a recombinant thermostable DNA polymerase enzyme which is characterized in that (a) in its native form, the polymer- 
ase comprises the amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 1) where X is any amino acid, (b) the recom- 
binant polymerase comprises at least one mutation within the amino acid sequence, except that X at position 4 is not 
mutated to E, and c) the enzyme has reduced discrimination against incorporation of nucleotides labeled with fluores- 
cein family dyes, in comparison to the corresponding native enzyme. 

15 [0047] Similarly, thermostable DNA polymerases that comprise critical motifs that are similar, but not identical to the 
critical motif that is amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 7) where X at position 4 is any amino acid 
except E, are within the scope of this invention. Specifically, in one embodiment, the critical motif is the amino acid 
sequence LXXXXXXXXXE (SEQ ID NO: 8) where X at position 4 is any amino acid except E. In the three letter code, 
this amino acid sequence is represented as LeuXaaXaaXaaXaaXaaXaaXaaXaaXaaGlu (SEQ ID NO: 8), whereby 

20 "Xaa" at positions 2, 3. 5, 6, 7, 8, 9 and 10 are any amino acid and "Xaa" at position 4 is any amino acid except Glu. 
[0048] In another embodiment, the critical motif is amino acid sequence L(S/A)XX(Lyi)XXXXXE (SEQ ID NO: 9) where 
X at position 4 is any amino acid except E. In the three-letter code, this amino acid sequence is represented as LeuX- 
aaXaaXaaXaaXaaXaaXaaXaaXaaGlu (SEQ ID NO: 9) ( whereby "Xaa" at positions 3, 6, 7, 8, 9, and 10 are any amino 
acid, "Xaa" at position 2 is Ser or Ala, "Xaa" at position 4 is any amino acid except Glu, and "Xaa" at position 5 is Leu 

25 or He. 

[0049] In yet another embodiment, the critical motif is amino acid sequence LSXXLXXXXXE (SEQ ID NO: 1 0) where 
X at position 4 is any amino acid except E. In the three-letter code, this amino acid sequence is represented as LeuS- 
erXaaXaaLeuXaaXaaXaaXaaXaaGlu (SEQ ID NO: 10), whereby "Xaa" at positions 3, 6, 7, 8, 9, and 10 are any amino 
acid and "Xaa" at position 4 is any amino acid except Glu. 

30 [0050] The ability of the enzymes of this invention to efficiently incorporate nucleotides labeled with fluorescein family 
dyes is measured by ddNTP incorporation assays. One such assay is a primer extension competition assay conducted 
under conditions of limiting template. In this assay, a primer DG48 (S'-GGGAAGGGCGATCGGTGCGGGCCTCT- 
TCGC), (SEQ ID NO: 1 1), bound to M13mp18 template (Innis et a/., 1988, Proc. NstL Acad. ScL USA 8§:9436) is 
extended in the presence of [a-^PJdCTP and excess enzyme with various levels of a fluorescently labeled ddNTP, 

35 Zowie<idCTP. Because the incorporation of a ddCTP residue terminates the extension reaction, the more readily a 
DNA polymerase incorporates a ddCTP into an extended primer, the less [<x- 33 P]dCTP can be incorporated. Thus, as 
the efficiency of fluorescently labeled ddCTP incorporation increases, the extent of inhibition of DNA synthesis is 
increased. The reactions were also performed with various levels of an unlabeled ddCTP. The concentrations of ddCTP 
and Zowie<ldCTP needed for 50% inhibition were calculated and compared to give a relative measure of the ability of 

40 the enzyme to incorporate the f luorescently-labeled nucleotide. The details of the ddNTP incorporation assay are pro- 
vided in Example II. 

[0051 ] Thus, in one embodiment of the invention, the characteristic of reduced discrimination against incorporation of 
nucleotides labeled with fluorescein family dyes is measured by the fluorescent ddNTP incorporation assay described 
in Example II. In a preferred embodiment, the concentration of a ddNTP labeled with a fluorescein dye, Zowie-ddCTP, 
45 required for 50% inhibition of DNA synthesis is reduced at least 3-fold for a mutant enzyme of the invention, relative to 
the wild-type enzyme. In a more preferred embodiment, the concentration is reduced at least Mold. In a most preferred 
embodiment, the concentration is reduced at least 10-fold. In another embodiment, the characteristic of reduced dis- 
crimination is assayed by measuring fluorescent dNTP incorporation. 

[0052] In another aspect of the invention, the thermostable DNA polymerase gene sequence and enzyme are derived 
so from various thermophilic species. In one embodiment, the polymerase gene sequence and enzyme are from a species 
of the genus Thermus. In other embodiments of the invention, the gene sequence and enzyme are from thermophilic 
species other than Thermus. The full nucleic acid and amino acid sequence for numerous thermostable DNA polymer- 
ases are available. The sequences each of Taq, Thermus thermopilus (Tth), Thermus species Z05, Thermus species 
sps17, Thermotoga maritima (Tma), and Thermosipho africanus [Tat) polymerase have been published in PCT Inter- 
55 national Patent Application No. PCT/U.S.91/07035 which published as PCT Patent Publication No. WO 92/06200 on 
April 16, 1992, and is incorporated herein by reference. The sequences for the DNA polymerase from Thermus flavus, 
Bacillus caldotenax, and Bacillus stearothermophilus have been published in Akhmetzjanov and Vakhitov, 1992, 
Nucleic Acids Research 2Q (21):5839, Uemori ef a/., 1993, *L Biochem. 113:401-410, and as accession number 
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BSU23149.ng from the NG:New GenBank database, respectively, which are each incorporated herein by reference. 
The sequence of the thermostable DNA polymerase from Thermus caldophilus is found in EMBL/GenBank Accession 
No. U62584. The sequence of the thermostable DNA polymerase from Thermus fififormis can be recovered from ATCC 
Deposit No. 42380 using the methods provided in U.S. Patent No. 4,889,81 8, as well as the sequence information pro- 
5 vided in Table 1 . The sequence of the Thermotoga neapolitana DNA polymerase is from GeneSeq Patent Data Base 
Accession No. R98144 (also disclosed in PCT WO 97/09451). 

Table I 

10 Organism Critical Motif Critical Amino Acid f 

Position 



15 


Consensus 


L 


S/a 


- 


- 


L/i 


- 


- 


- 


- 


- 


E 






Thermus aquaticus 


L 


S 


Q 


E 


L 


A 


I 


P 


Y 


E 


E 


681 




Thermits flavus 


L 


S 


G 


E 


L 


S 


I 


P 


Y 


E 


E 


679 


20 


Thermus thermophilus 


L 


S 


Q 


E 


L 


A 


I 


P 


Y 


E 


E 


683 




Thermus specie ZQ5 


L 


s 


Q 


E 


L 


A 


I 


P 


Y 


E 


E 


683 




Thermus specie spsl 7 


L 


s 


Q 


E 


L 


S 


I 


P 


Y 


E 


E 


679 


25 


Thermus caldophilus 


L 


s 


Q 


E 


L 


A 


I 


P 


Y 


E 


E 


683 




Thermus filiformis 


L 


s 


Q 


E 


L 


S 


I 


P 


Y 


E 


E 


679 


30 


Thermotoga maritima 


L 


s 


V 


R 


L 


G 


V 


P 


V 


K 


E 


744 


Thermotoga neapolitana 


L 


s 


V 


R 


L 


G 


I 


P 


V 


K 


E 


744 




Thermosipho qfricanus 


L 


s 


K 


R 


I 


G 


L 


S 


V 


S 


E 


743 


35 


Bacillus caldotenaxl 


L 


A 


Q 


N 


L 


N 


I 


s 


R 


K 


E 


725, 724 




Bacillus stearothermophilus 


2L 


A 


Q 


N 


L 


N 


I 




R 


K 


E 


724, 727, 802 



1. Protein sequence from Accession No. D12982, Uemori T., Ishino Y., Fujita K., Asadi 
K., Kato I. "Cloning of the DNA polymerase gene of Bacillus caldotenax anc 
characterization of the gene product" L Biochem. 113:401 (1993). The critical residue it 
that sequence is 725. An almost identical protein sequence is provided as a putativ< 
"Bacillus stearothermophilus" DNA Polymerase in Accession No. R45155 and WPI 93 
408323/51. The critical residue in that sequence is 724. 
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2, There are several sequence submissions for Bacillus stearothermophilus DNA 
polymerase in the GeneBank, or SwissProt/PIR databases. Although these sequences are 
highly related, but somewhat different from one another, each contains the identical 
L(S/A)XX(UI)XXXXXE (SEQ ID NO: 9) motif, where X is any amino acid except E. In 
the three-letter code, this amino acid sequence is represented as 
LeuXaaXaaXaaXaaXaaXaaXaaXaaXaaGlu (SEQ ID NO: 9), whereby "Xaa" at positions 3, 
6, 7, 8, 9, and 10 are any amino acid, "Xaa" at position 2 is Ser or Ala, "Xaa" at position 4 
is any amino acid except Glu, and "Xaa" at position 5 is Leu or He. 

In the table above, protein sequences comprising the Critical Residue in the Critical Motif at position 724 are provided 
by Japanese patent publication J 05 304 964A, EP No. 699,760, and Accession No. U33536. Another highly related, 
but somewhat different, protein sequence was published in Gene 153:65-68(1 995), contains the Critical Residue in the 
Critical Motif at position 727. Another highly related, but somewhat different, protein sequence, Accession No. U23149, 
for Bst DNA polymerase contains the Critical Residue in the Critical Motif at position 802. 

[0053] Because the DNA polymerases of each thermophilic species are unique, the amino acid position of the region 
of criticality is distinct for each enzyme. Amino acid and nucleic acid sequence alignment programs are readily available 
and, given the particular region identified herein, serve to assist in the identification of the exact sequence region of the 
invention. Such sequence alignment programs are available from the Genetics Computer Group, 575 Science Drive. 
Madison, Wisconsin. Given the particular motif identified herein, these programs, including "GAP," "BESTFIT," and 
"PILE UP/" serve to assist in the localization of the critical motif. The position of the regions of criticality are shown in 
Table I for thermostable DNA polymerases from exemplary thermophilic species. 

[0054] Regardless of the exact position of the critical motif LSXXLX(V/I)PXXE (SEQ ID NO: 7), where X at position 4 
is any amino acid except E, within the polymerase domain of a thermostable DNA polymerase, the presence of the motif 
serves to provide thermostable DNA polymerases having the ability to efficiently incorporate nucleotides labeled with 
fluorescein family dyes. Therefore, mutation of the conserved glutamic acid of the thermostable DNA polymerases of 
Thermus flavus (Glu 679). Thermus thermophilus (Glu 683), Thermus species Z05 (Glu 683). Thermus species sps17 
(Glu 679) Thermus caldophilus (Glu 683), Thermus filiformis (Glu 679) to produce the critical motif will provide an 
enhancing effect on the ability of the polymerase to efficiently incorporate nucleotides labeled with fluorescein family 
dyes. 

[0055] In addition, in view of the highly conserved nature of the now identified critical motif, novel thermostable DNA 
polymerases may be identified based upon their homology to, for example, Taq DNA polymerase or the sequences of 
other DNA polymerases in Table I (see for example US. Patent Nos. 5,618,71 1 and 5,624,833 which are herein incor- 
porated by reference). Such polymerases, so long as their peptide sequence is at least 45% and most preferably 
greater than 80% homologous to the Taq polymerase amino acid sequence, as determined by the methods described 
herein, are within the scope of the present invention. Consequently, the invention relates to a class of enzymes which 
also includes, for example, the thermostable DNA polymerase, and corresponding gene and expression vectors from 
Thermus oshimal (Williams RA, et al, 1996 ]nt J Syst Bacteriol 46 (2): 403-408); Thermus sllvanus and Thermus 
chfiarophilus (Tenreiro S, et al., 1995, JqL i SysL Bacteriol 45 (4): 633-639); Thermus scotoductus (Tenreiro S et al., 
1 995, RfiS^ Mjciobiol 14£ (4): 315-324); Thermus ruber ATCC 35948, (LG. Loginova, 1984, JnL d, SysL BasierM 24: 
498-499); and Thermus brockianus (Munster, M.J., 1986, i Gerr Microbiol 132: 1677), which publications are each 
incorporated herein by reference. 

[0056] Those of skill in the art will recognize that the above thermostable DNA polymerases with enhanced efficiency 
for incorporating f luorescein-labeled nucleotides are most easily constructed by recombinant DNA techniques such as 
site<lirected mutagenesis. See for example Sambrook et al., Molecular Cloning; £ Laboratory Manual , Cold Spring 
Harbor, 1989, second edition/chapter 15.51, "Oligonucleotide-mediated mutagenesis," which is incorporated herein by 
reference. This technique is now standard in the art, and can be used to create all possible classes of base pair 
changes at any determined site in a gene. The method is performed using a synthetic oligonucleotide primer comple- 
mentary to a single-stranded phage or plasmid DNA to be mutagenized except for a limited mismatching, which repre- 
sents the desired mutation. Briefly, the synthetic oligonucleotide is used as a primer to direct synthesis of a strand 
complementary to the phage or plasmid, and the resulting double-stranded DNA is transformed into a phage- or plas- 
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mid-supporting host bacterium. The resulting bacteria can be assayed by, for example, DNA sequence analysis or 
probe hybridization to identify those plaques or colonies carrying the desired mutated gene sequence. 
[0057] Subsequent to the invention of PCR, primer-directed mutagenesis (described in U.S. Patent No. 4,683,195. 
which is herein incorporated by reference) and "overlap PCR" (Higuchi, 1989, in P£R Technology, ed. Erlich, Stockton 
5 Press, New York, NY, pp.61 -70) have become routine means of introducing any mutation at any position of a gene. 
[0058] The mutated DNA can be recovered from the plasmid, phasmid, phage or amplification reaction by conven- 
tional means and ligated into an expression vector for subsequent culture and purification of the resulting enzyme. 
Numerous cloning and expression vectors are suitable for practicing the invention, including mammalian and bacterial 
systems, as described in, for example, Sambrook et a/., 1989 supra. For convenience, the present invention is exem- 
10 plified utilizing the lambda derived P L promoter {Shimatake et a/., 1981 , Nature 222:128). Use of this promotor is spe- 
cifically described in U.S. Patent Nos. 4,71 1 ,845 and 5,079,352, which are incorporated herein by reference. 
[0059] Plasmid pCS1 has been deposited with the ATCC, on August 28. 1997, and given accession No. 98521. This 
plasmid contains a gene encoding a thermostable DNA polymerase which gene is mutated at the codon at position 681 
such that glutamic acid is replaced with lysine in the resulting polypeptide and provides a means for providing ther- 
15 mostable DNA polymerases having an enhanced efficiency for incorporating nucleotides labeled with fluorescein family 
dyes. Example I illustrates the use of flanking restriction sites suitable for subcloning the E681 K mutation to create other 
thermostable DNA polymerase enzymes. Alternatively, because the complete gene sequence for numerous thermosta- 
ble DNA polymerases are known, other means for introducing the E681K mutation, such as restriction digestion and 
fragment replacement, are readily available to those of skill in the art, having the availability of ATCC deposits and the 
20 sequence information provided herein. 

[0060] When one desires to produce one of the mutant or native enzymes of the present invention, or a derivative or 
homologue of those enzymes, the production of the enzyme typically involves the transformation of a host cell with the 
expression vector, and culture of the transformed host cell under conditions such that expression will occur. Means for 
transforming and culturing transformed host cells are well known in the art and are described in detail in. for example, 
25 Sambrook et a/. , 1 989, supra. 

[0061] The thermostable DNA polymerases of the present invention are generally purified from E coli strain DG1 16 
(deposited as ATCC 53606 on April 7, 1987) which has been transformed with an expression vector operably linked to 
a gene encoding a wild-type or modified thermostable DNA polymerase. Methods for purifying the thermostable DNA 
polymerase are described in. for example. Example I and Lawyer et a/., 1993, PCR Methods and Applications 2:275- 
30 87, which is incorporated herein by reference. 

[0062] The thermostable enzymes of the invention may be used for any purpose in which such enzyme activity is nec- 
essary or desired. Examples of uses include DNA sequencing, DNA labeling, and labeling of primer extension products. 
DNA sequencing by the Sanger dideoxynucleotide method (Sanger etal, 1977, Proc. Natl. Acad. Sci. 74; 5463) is par- 
ticularly improved by the present invention. Advances in the basic Sanger ef a/, method have provided novel vectors 
35 (Yanisch-Perton et a/, 1985 Gene 33:103-119) and base analogues (Mills ef a/., 1979, ErocNatL AcasLScLZ£:2232- 
2235, and Barr ef a/., 1986, Biotechniaues 4:428-432). In general, DNA sequencing requires template-dependent 
primer extension in the presence of chain-terminating base analogs, resulting in a distribution of partial fragments which 
are subsequently separated by size. The basic dideoxy sequencing procedure involves (i) annealing an oligonucleotide 
primer, optionally labeled, to a template; (ii) extending the primer with DNA polymerase in four separate reactions, each 
40 containing a mixture of unlabeled dNTPs and a limiting amount of one chain terminating agent such as a ddNTP, option- 
ally labeled; and (iii) resolving the four sets of reaction products on a high -resolution denaturing polyacrylamide/urea 
gel. The reaction products can be detected in the gel by autoradiography or by fluorescence detection, depending on 
the label used, and the image can be examined to infer the nucleotide sequence. These methods utilize DNA polymer- 
ase such as the Klenow fragment of E coli Pol I or a modified T7 DNA polymerase. 
45 [0063] The availability of thermoresistant polymerases, such as Tag DNA polymerase, resulted in improved methods 
for sequencing with thermostable DNA polymerase (see Innis ef a/., 1988, supra) and modifications thereof referred to 
as "cycle sequencing" (Murray, 1989, Ny£ Acids R^ 17:8889). As an alternative to basic dideoxy sequencing, cycle 
sequencing is a linear, asymmetric amplification of target sequences complementary to the template sequence in the 
presence of chain terminators. A single cycle produces a family of extension products of all possible lengths. Following 
so denaturation of the extension reaction product from the DNA template, multiple cycles of primer annealing and primer 
extension occur in the presence of terminators such as ddNTPs. Cycle sequencing requires less template DNA than 
conventional chain-termination sequencing. Thermostable DNA polymerases have several advantages in cycle 
sequencing; they tolerate the stringent annealing temperatures which are required for specific hybridization of primer to 
nucleic acid targets as well as tolerating the multiple cycles of high temperature denaturation which occur in each cycle. 
55 i.e., 90-95° C. For this reason, AmpliTaq® DNA Polymerase and its derivatives and descendants have been included 
in Taq cycle sequencing kits commercialized by companies such as Perkin-Elmer, Norwalk, CT. 
[0064] Two variations of chain termination sequencing methods exist - dye-primer sequencing and dye-terminator 
sequencing. In dye-primer sequencing, the ddNTP terminators are unlabeled, and a labeled primer is utilized to detect 
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extension products (Smith et a/., 1986, Nature 32 :674-679). In dye-terminator DNA sequencing, a DNA polymerase is 
used to incorporate dNTPs and fluorescently labeled ddNTPs onto the end of a DNA primer (Lee er a/., supra.). This 
process offers the advantage of not having to synthesize dye labeled primers. Furthermore, dye-terminator reactions 
are more convenient in that all four reactions can be performed in the same tube. 

s [0065] Both dye-primer and dye-terminator methods may be automated using an automated sequencing instrument 
produced by Applied Biosystems, Foster City, CA. (U.S. Patent No. 5,171,534, which is herein incorporated by refer- 
ence). When using the instrument, the completed sequencing reaction mixture is fractionated on a denaturing polyacr- 
ylamide gel mounted in the instrument. A laser at the bottom of the instrument detects the fluorescent products as they 
are electrophoresed according to size through the gel. 

w [0066] Two types of fluorescent dyes are commonly used to label the terminators used for dye-terminator sequencing 
- negatively charged and zwitterionic fluorescent dyes. Negatively charged fluorescent dyes include those of the fluo- 
rescein and BODIPY families. BODIPY dyes (4,4-d*rfluoro-4-bora-3a,4a-diaza-s-indacene) are described in Interna- 
tional Patent Application, Publication No. WO 97/00967, which is incorporated herein by reference. Zwitterionic 
fluorescent dyes include those of the rhodamine family. Commercially available cycle sequencing kits use terminators 

is labeled with rhodamine derivatives. However, the rhodamine-labeled terminators are rather costly and the product must 
be separated from unincorporated dye-ddNTPs before loading on the gel since they co-migrate with the sequencing 
products. Rhodamine dye family terminators seem to stabilize hairpin structures in GC-rich regions, which causes the 
products to migrate anomalously. This requires the use of dITP which relaxes the secondary structure but also affects 
the efficiency of incorporation of terminator. 

20 [0067] In contrast, fluorescein -labeled terminators eliminate the separation step prior to gel loading since they have 
a greater net negative charge and migrate faster than the sequencing products. In addition, fluorescein-labeled 
sequencing products have better electrophoretic migration than sequencing products labelled with rhodamine. 
Although wild-type Taq DNA polymerase does not efficiently incorporate terminators labeled with fluorescein family 
dyes, this can now be accomplished efficiently by use of the modified enzymes provided herein. 

25 [0068] Thus, the scope of this invention includes novel methods for dideoxy sequencing using enzymes having the 
critical motif, as well as kits for performing the method. In one embodiment, the sequencing method of the invention 
comprises: 

a) providing a recombinant thermostable DNA polymerase enzyme which is characterized in that 

30 

i) in its native form said polymerase comprises the amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 
1),where X is any amino acid, 

ii) the X at position 4 in said sequence is mutated in comparison to said native sequence, except that X is not 
mutated to E; and 

35 iii) said thermostable DNA polymerase has reduced discrimination against incorporation of nucleotides labeled 

with fluorescein family dyes in comparison to the native form of said enzyme; and 

b) performing a dye-terminator sequencing reaction. 

40 [0069] In a preferred embodiment of the above method, the native form enzyme has the amino acid sequence 
l_S(Q/G)XL(S/A)IPYEE (SEQ ID NO: 2), where X is any amino acid. In the three-letter code, this amino acid sequence 
is represented as LeuSerXaaXaaLeuXaalleProTyrGluGlu (SEQ ID NO: 2), whereby "Xaa" at position 3 is Gin or Gly, 
"Xaa" at position 4 is any amino acid, and "Xaa" at position 6 is Ser or Ala. In a more preferred embodiment, the native 
form amino acid sequence is LSQXLAIPYEE (SEQ ID NO:3), where X is any amino acid. In the three-letter code, this 

45 amino acid sequence is represented as LeuSerGlnXaaLeuAlalleProTyrGluGlu (SEQ ID NO:3), whereby "Xaa" at posi- 
tion 4 is any amino acid. In a most preferred embodiment, the "Xaa" at position 4 is Lys. 

[0070] As described above, DNA sequencing with thermostable DNA polymerases requires a mixture of unconven- 
tional base analogues that act.as chain-terminators and conventional nucleotides at a specified ratio of concentrations 
that insures that a population of extension products would be generated representing all possible fragment lengths over 

so a distance of several hundred bases. Some thermostable DNA polymerases previously used for sequencing, such as 
wild-type Taq polymerase, are characterized in that they preferentially incorporate conventional nucleotides in the pres- 
ence of a mixture of conventional and unconventional nucleotides. However, some recently described thermostable 
DNA polymerases allow the ratio of unconventional base analogues to conventional bases to be reduced from a hun- 
dred to several hundred fold, or up to over a thousand fold. 

55 [0071] One such polymerase is the F667Y mutant of Taq DNA polymerase. Another such mutant is a Taq DNA 
polymerase having an F667Y mutation and a mutation at position 46 which changes a glycine residue to an aspartic 
acid residue (G46D) mutation. This mutant polymerase, known as Ampli Taq t FS, is manufactured by Hoffmann-La 
Roche and marketed by Perkin-Elmer. F730YTma30 DNA Polymerase is another such polymerase. This mutant 
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polymerase is a combination of 1) nucleotides 1 -570 of Taq DNA polymerase modified to encode a G46D mutation and 
2) nucleotides 571-2679 of Tma DNA polymerase modified to encode an aspartic acid to alanine mutation at position 
323 ( a glutamic acid to alanine mutation at position 325, and a phenylalanine to tyrosine mutation at position 730 (U.S. 
Patent Application Serial No. 60/052065 and European Patent Application No. 98112327.6, both hereby incorporated 

5 by reference). Another polymerase that incorporates unconventional base analogues is a F730Y mutant DNA polymer- 
ase from Thermotoga neapolitana (International Patent Application, Publication Nos. WO 96/1 0640, WO 96/41 01 4, and 
WO 97/09451 , which are hereby incorporated by reference). Using these enzymes, for a given dNTP concentration, the 
rhodamine-ddNTP concentration can be decreased by about 50- to several hundred-fold compared to thermostable 
DNA polymerases previously available. 

10 [0072] The E681 K mutation of the invention was combined using recombinant DNA methods with an F667Y mutation 
to produce the double mutant Taq DNA polymerase enzyme used in the sequencing reactions described in Example 
IV, The double mutant was used in a dye-terminator sequencing reaction with f luorescein-labeled dye terminators. The 
results, described in Example IV, show that the enzyme is capable of incorporating f luorescein-labeled dye terminators 
in a sequencing reaction and produces sequencing ladders that can be accurately read in an automated sequencing 

is instrument. Unexpectedly, the combination of the E681 K and the F667Y mutations was also found to produce a ther- 
mostable DNA polymerase enzyme with a 3- to 4-fold increased extension rate relative to an enzyme with the F667Y 
mutation alone, as measured by the assay described in Example III. 

[0073] Thus, in another aspect of this invention, the critical motif identified in this invention can be combined with 
motifs conferring reduced discrimination against ddNTPs to produce polymerases having an increased efficiency of 
20 incorporation of both labeled and unlabeled ddNTPs. These polymerases are useful in DNA sequencing methods. In 
one embodiment of the present invention, a thermostable DNA polymerase having the critical motif defined herein also 
comprises the critical motif that includes the F667Y mutation, described in U.S. Serial No 08/448,223. In this embodi- 
ment, the thermostable DNA polymerase is characterized in that 

25 i) in its native form said polymerase comprises a first amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 1), 
where X is any amino acid, 

ii) the X at position 4 in said first amino acid sequence is mutated in comparison to said native sequence, except 
that X is not mutated to E; and 

iii) said thermostable DNA polymerase has reduced discrimination against incorporation of nucleotides labeled with 
30 fluorescein family dyes in comparison to the native form of said enzyme; and 

iv) said polymerase comprises a second amino acid sequence MRRXXKXXNYXXXYG (SEQ ID NO: 12) where X 
is any amino acid; 

v) said thermostable DNA polymerase also has reduced discrimination against incorporation of unconventional 
nucleotides in comparison to the native form of said enzyme. In the three-letter code, the second amino acid 

35 sequence is represented by MetArgArgXaaXaaLysXaaXaaAsnTyrXaaXaaXaaTyrGly (SEQ ID NO: 12), where 
"Xaa" at positions 4, 5, 7, 8, 1 1 , 12, and 13 is any amino acid. In a preferred embodiment, the "Xaa" at position 4 
in the first amino acid sequence is mutated to Lys. In a more preferred embodiment, the enzyme is Taq DNA 
polymerase and it comprises the E681 K and F667Y mutations. Also within the scope of this invention are methods 
of sequencing using the above polymerases. 

40 

[0074] Also within the scope of this invention is the improved sequencing method of the invention performed using 
thermostable DNA polymerase enzymes having a critical motif which is not derived by mutation, but which critical motif 
exists as a natural variant. In this aspect, the DNA polymerase of a thermophilic bacterial species has a critical motif in 
which the residue at position 4 is not Glu. For example, in the thermostable DNA polymerase from Thermotoga 

45 neapolitana, the X at position 4 in the motif LSXXLX(V/i)PXXE (SEQ ID NO: 7), where X is any amino acid except E, is 
an arginine residue. Thus, the invention provides for improved methods of DNA sequencing using a native thermostable 
DNA polymerase which comprises the amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 7) where X can be any 
amino acid except E. In the three-letter code, this amino acid sequence is represented as LeuSerXaaXaaLeuXaaXa- 
aProXaaXaaGlu (SEQ ID NO: 7), where "Xaa" at positions 3, 6, 9. and 10 are any amino acid and "Xaa" at position 4 

so is any amino acid except Glu and "Xaa" at position 7 is Val or lie. In this embodiment, the sequencing method of the 
invention comprises: 

a) providing a thermostable DNA polymerase which is characterized in that 

55 i) said polymerase comprises the amino acid sequence 

LSXXLX(V/I)PXXE (SEQ ID NO: 7), where X at position 4 is any amino acid except E, 

ii) said thermostable DNA polymerase has reduced discrimination against incorporation of nucleotides labeled 

with fluorescein family dyes; and 
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b) providing a dye-terminator labeled with a fluorescein family dye, and 

c) performing a dye-terminator sequencing reaction. 

[0075] In a more preferred embodiment, the sequencing method of the invention comprises: 

5 

a) providing a thermostable DNA polymerase which is characterized in that 

i) said polymerase comprises a first amino acid sequence 

LSXXLX(V/I)PXXE (SEQ ID NO: 7), where X at position 4 is any amino acid except E, 
w ii) said thermostable DNA polymerase has reduced discrimination against incorporation of nucleotides labeled 

with fluorescein family dyes; 

iii) said polymerase comprises a second amino acid sequence MRRXXKXXNYXXXYG (SEQ ID NO: 12) where 
X is any amino acid. 

iv) said thermostable DNA polymerase has reduced discrimination against incorporation of unconventional 
is nucleotides; and 

b) providing a dye-terminator labeled with a fluorescein family dye, and 

c) performing a dye-terminator sequencing reaction. 

20 [0076] In another preferred embodiment, the enzyme comprises the amino acid sequence LS(Q/G)XL(S/A)IPYEE 
(SEQ ID NO: 13), where X at position 4 is any amino acid except E. In the three-letter code, this amino acid sequence 
is represented as LeuSerXaaXaaLeuXaalleProTyrGluGlu (SEQ ID NO: 13) whereby "Xaa" at position 3 is Gin or Gly, 
"Xaa" at position 4 is any amino acid except Glu, and "Xaa" at position 6 is Ser or Ala. In a more preferred embodiment, 
the amino acid sequence is LSQXLAIPYEE (SEQ ID NO: 14), where X is any amino acid except E. In the three-letter 

25 code, this amino acid sequence is represented as LeuSerGlnXaaLeuAlalleProTyrGluGIu (SEQ ID NO: 14), whereby 
"Xaa" at position 4 is any amino acid except Glu. 

[0077] In yet another preferred embodiment, the enzyme has the amino acid sequence LSVXLG(V/I)PVKE (SEQ ID 
NO: 15). where X is any amino acid except E. In the three-letter code, this amino acid sequence is represented as 
LeuSerValXaaLeuGlyXaaProValLysGlu (SEQ ID NO: 1 5), whereby "Xaa" at position 4 is any amino acid except Glu and 

30 "Xaa" at position 7 is Val or lie. In a more preferred embodiment, the amino acid sequence is LSVXLGVPVKE (SEQ ID 
NO: 16), where X is any amino acid except E. In the three-letter code, this amino acid sequence is represented as 
LeuSerValXaaLeuGlyValProValLysGlu (SEQ ID NO: 16), whereby "Xaa" at position 4 is any amino acid except Glu. In 
a most preferred embodiment, the "Xaa" at position 4 is Arg. In another more preferred embodiment, the amino acid 
sequence is LSVXLGIPVKE (SEQ ID NO: 1 7), where X is any amino acid except E. In the three-letter code, this amino 

35 acid sequence is represented as LeuSerValXaaLeuGlylleProValLysGlu (SEQ ID NO: 1 7), whereby "Xaa" at position 4 
is any amino acid except Glu. In a most preferred embodiment, the "Xaa" at position 4 is Arg. 
[0078] In another embodiment of the invention, the sequencing methods are performed using a native enzyme which 
has a reduced level of discrimination against nucleotides labeled with fluorescein family dyes which level is measured 
using a ddNTP incorporation assay such as that described in Example II. The concentration of ddCTP required for 50% 

40 inhibition of DNA synthesis is determined, as is the concentration of Zowie-ddCTP needed for 50% inhibition. The ratio 
of the concentration for Zowie-ddCTP to the concentration for ddCTP is calculated. In a preferred embodiment, the ratio 
is 10 or less. In a more preferred embodiment, the ratio is 4 or less. In a most preferred embodiment, the ratio is 1 .2 or 
less. 

[0079] Although the examples provided herein use dideoxynucleotides labeled with fluorescein family dyes, the use 
45 of other unconventional nucleotides and fluorescent dyes is also within the scope of this invention. Other unconven- 
tional nucleotides include f luorescently-labled dNTPs, which can be used to label the products of DNA synthesis, and 
fluorescently-labled rNTPs, which can be used to label the primer extension products. Other dyes include other nega- 
tively charged fluorescent dyes, such as BODIPY, which are structurally and chemically similar to fluorescein. Other 
dyes also include cyanine dyes. Cyanine-labeled dNTPs were added to a standard PCR reaction which included a Taq 
so DNA polymerase with the E681K mutation (and a G46D mutation). The cyanine-labeled dNTPs were unexpectedly 
found to be incorporated into amplification products at a level that was higher than for the wild-type enzyme. Thus, in 
this aspect, a method of labeling DNA of the invention uses a native or mutant polymerase of the invention in combina- 
tion with a nucleotide labeled with either a negatively charged fluorescent dye or a cyanine dye. In one embodiment, 
the DNA labeling method of the invention comprises: 

55 

a) providing a thermostable DNA polymerase characterized in that 
i) said polymerase comprises the amino acid sequence 
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LSXXLX(V/I)PXXE (SEQ ID NO: 7) where X at position 4 can be any amino acid except E 

ii) said polymerase has reduced discrimination against incorporation of unconventional nucleotides, and 

b) providing a nucleotide labeled with a negatively charged fluorescent dye, and 
5 c) performing a DNA synthesis reaction. 

[0080] In another embodiment, the DNA labeling method of the invention comprises: 

a) providing a thermostable DNA polymerase characterized in that 

10 

i) said polymerase comprises the amino acid sequence 

LSXXLX(V/I)PXXE (SEQ ID NO: 7) where X at position 4 can be any amino acid except E 

ii) said polymerase has reduced discrimination against incorporation of unconventional nucleotides, and 

is b) providing a nucleotide labeled with a cyanine dye, and 

c) performing a DNA synthesis reaction. 

[0081] In another aspect of the invention, a thermostable DNA polymerase is provided which combines a mutation 
allowing more efficient incorporation of rNTPs, such as the glutamic acid to glycine mutation at position 615 of Taq DNA 

20 polymerase, and the critical motif of this invention. The resulting enzyme is expected to have an increased efficiency of 
incorporation of ribonucleotides labeled with fluorescein family dyes. Thus, in one embodiment, the invention provides 
a recombinant thermostable DNA polymerase which (1> in its native form comprises the amino acid sequence 
LSXXLX(V/I)PXXE (SEQ ID NO: 1) where X is any amino acid and (2) the X at position 4 is mutated such that X at posi- 
tion 4 is not mutated to E and (3) also comprises the region of criticality which is amino acid sequence SQIXLR(V/I) 

25 (SEQ ID NO: 18) where "X" is any amino acid except E, and (4) is capable of efficient incorporation of ribonucleotides 
labeled with fluorescein family dyes. In the three-letter code, the latter sequence is represented as SerGlnlleXaa- 
LeuArgXaa, where "Xaa" at position 4 is any amino acid except Glu and "Xaa" at position 7 is Val or He 
[0082] Mutant polymerase domains such as that for Taq containing the E615G and E681 K mutations are useful in 
improved methods of producing primer extension products labeled with fluorescein family dyes. For example, in a 

30 primer extension reaction such as PCR, rNTPs labeled with fluorescein family dyes are substituted at least partially for 
one of the 4 standard dNTPs and a double mutant polymerase such as E681K E615G Taq DNA polymerase is 
included. The mutant polymerase synthesizes primer extension products that have fluorescein-labeled ribonucleotide 
residues at various positions along their lengths. Upon heat or alkali treatment, the primer extension products are frag- 
mented at each ribonucleotide residue, producing a population of end-labeled fragments. This population of uniformly 

35 labeled fragments represents a distribution of the fluorescent label across the length of the primer extension product. 
Labeled fragments of these characteristics are useful in nucleic acid detection formats based on silicon chips, such as 
that of (Cronin et al., supra.). Thus, in one embodiment, the invention provides a method of labeling primer extension 
products which comprises (1) providing a thermostable DNA polymerase which (a) in its native form comprises the 
amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 1) where X is any amino acid, (b) the X at position 4 is mutated 

40 such that X at position 4 is not mutated to E, (c) also comprises the region of criticality which is amino acid sequence 
SQIXLR(V/I) (SEQ ID NO: 18) where "X" is any amino acid except E and (d) is capable of efficient incorporation of ribo- 
nucleotides labeled with fluorescein and/or cyanine family dyes, and (2) performing a primer extension reaction. 
[0083] In yet another aspect, enzymes having the critical motif of this invention display an increased rate of extension 
relative to the wild-type enzyme as shown in Example IV for a E681 K F667Y mutant. In one embodiment, the enzyme 

45 is characterized in that (1) in its native form, it comprises the amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 1) 
where X is any amino acid, (2) the amino acid sequence is mutated at position 4 such that X at position 4 is not mutated 
to E, and (3) it has an increased extension rate relative to the wild-type enzyme. In a preferred embodiment, the enzyme 
is characterized in that (1) in its native form, it comprises the amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 1) 
where X is any amino acid, (2) the amino acid sequence is mutated at position 4 such that X at position 4 is not mutated 

so to E and (3) it also comprises the amino acid sequence M RRXXKXXNYXXXYG (SEQ ID NO: 1 2) where X is any amino 
acid and (4) has an increased extension rate. In a more preferred embodiment, the enzyme is characterized in that (1) 
in its native form it contains the amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 1) where X is any amino acid 
and (2) the amino acid sequence is mutated at position 4 such that X at position 4 is mutated to K. In a most preferred 
embodiment, the enzyme is Taq DNA polymerase and contains the E681K mutation and the F667Y mutation. Also 

55 included within this aspect are methods of sequencing and labeling of DNA using the polymerases with increased 
extension rate as well as kits for doing the same. 

[0084] In a preferred method for DNA sequencing according to the invention, thermostable pyrophosphatase is 
included in the reaction mixture. Pyrophosphatase has been shown to enhance sequencing data using mesophilic as 
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well as the mutant thermostable DNA polymerases described in European Patent Application, Publication No. EP-A- 
763 599 and U.S. Patent No. 5,665,551 . 

[0085] In an exemplified embodiment, the thermostable DNA polymerase of the invention also contains a mutation in 
the S'-nuclease domain that serves to greatly attenuate this nuclease activity. Modified forms of Taq polymerase have 
been described in PCT Patent Publication No. WO 92/06200, published April 16, 1992 and in U.S. Patent No. 
5,466,591 . In one embodiment of that invention, the codon for the glycine residue at amino acid position 46 has been 
replaced with a codon for aspartic acid (G46D mutation). The resulting enzyme has enhanced utility in cycle sequenc- 
ing reactions due to the decreased 5*-nuclease activity. The polymerase domain amino acid sequence and polymerase 
activity are both unchanged in the G46D mutant in comparison to the wild-type enzyme. 

[0086] In a commercial embodiment of the invention, kits for practicing methods that are improved by use of the 
present invention are considered to be an additional aspect of the invention. One such kit for DNA sequencing com- 
prises 

a) a thermostable DNA polymerase characterized in that 

i) said polymerase comprises the amino acid sequence 

LSXXLX(V/I)PXXE (SEQ ID NO: 7) where X at position 4 can be any amino acid except E 

ii) said polymerase has reduced discrimination against incorporation of nucleotides labeled with fluorescein 
family dyes, and 

b) a dye-terminator labeled with a negatively charged fluorescent dye and may additionally include other reagents 
for DNA sequencing such as dNTPs, thermostable pyrophosphatase and appropriate buffers. In another embodi- 
ment, the enzyme in the kit has the amino acid sequence LSVXLG(V/I)PVKE (SEQ ID NO: 15), whereby X is any 
amino acid except E. In the three-letter code, this amino acid sequence is represented as LeuSerValXaaLeuGlyX- 
aaProValLysGlu (SEQ ID NO: 15), whereby "Xaa" at position 4 is any amino acid except Glu and "Xaa" at position 
7 is Val or He. In a preferred embodiment, the amino acid sequence is LSVXLGVPVKE (SEQ ID NO: 16), where X 
is any amino acid except E. In the three-letter code, this amino acid sequence is represented as LeuSerValXaa- 
LeuGlyValProValLysGlu (SEQ ID NO: 16), whereby "Xaa" at position 4 is any amino acid except E. In a more pre- 
ferred embodiment, the "Xaa" at position 4 is Arg. In another preferred embodiment, the amino acid sequence is 
LSVXLGIPVKE (SEQ ID NO: 17) where X is any amino acid except E. In the three-letter code, this amino acid 
sequence is represented as LeuSerValXaaLeuGlylleProValLysGlu (SEQ ID NO: 17), whereby "Xaa" at position 4 
is any amino acid except Glu. In a more preferred embodiment, the "Xaa" at position 4 is Arg. 

[0087] Other kits for DNA sequencing comprise a mutant thermostable DNA polymerase characterized in that 

a) in its native form said polymerase comprises the amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 1) 
where X is any amino acid, 

b) said amino acid sequence is mutated at position 4, except that X at position 4 is not mutated to E; and 

c) said thermostable DNA polymerase has reduced discrimination against incorporation of nucleotides labeled with 
fluorescein family dyes in comparison to the native form of said enzyme and may additionally include reagents for 
DNA sequencing such as chain terminating compounds. dNTPs, thermostable pyrophosphatase and appropriate 
buffers. Where the terminators are labeled, preferable labels are fluorescent dyes, more preferable labels are neg- 
atively charged fluorescent dyes or cyanine dyes, and the most preferable labels are fluorescein family dyes. In a 
preferred embodiment, the enzyme in the kit has the amino acid sequence LS(Q/G)XL(S/A)IPYEE (SEQ ID NO: 2), 
where X is any amino acid. In the three-letter code, this amino acid sequence is represented as LeuSerXaaXaa- 
LeuXaalleProTyrGluGlu (SEQ ID NO: 2), whereby "Xaa" at position 3 is Gin or Gly, "Xaa" at position 4 is any amino 
acid, and "Xaa" at position 6 is Ser or Ala. In a more preferred embodiment, the amino acid sequence is LSQX- 
LAIPYEE (SEQ ID NO:3), where X is any amino acid. In the three-letter code, this amino acid sequence is repre- 
sented as LeuSerGlnXaaLeuAlalleProTyrGluGlu (SEQ ID NO:3), whereby "Xaa" at position 4 is any amino acid, in 
a most preferred embodiment, the "Xaa" at position 4 is Lys. 

[0088] Kits for labeling DNA comprise a thermostable DNA polymerase which is characterized in that (a) in its native 
form, the polymerase comprises the amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 1) where X is any amino 
acid, (b) the X at position 4 in said sequence is mutated in comparison to said native form, except that X at position 4 
is not mutated to E, and c) the enzyme has reduced discrimination against incorporation of nucleotides labeled with flu- 
orescein family dyes, in comparison to the corresponding wild-type enzyme and may additionally include dNTPs and 
appropriate buffers. In a preferred embodiment, the X at position 4 is mutated to K. Other kits for producing labeled DNA 
comprise a) a nucleotide or nucleotide analog labeled with a negatively charged fluorescent compound and b) a native 



15 



EP 0 902 035 A2 



thermostable DNA polymerases having the following critical motif: 

LSXXLX(V/I)PXXE (SEQ ID NO: 7) 
where X at position 4 can be any amino acid except E and said polymerase has reduced discrimination against incor- 
poration of fluorescein-labeled nucleotides, and may additionally include dNTPs and appropriate buffers. In a preferred 

5 embodiment, the X at position 4 is K. In another preferred embodiment, the X at position 4 is R. 

[0089] Kits for labeling primer extension products comprise a thermostable DNA polymerase which is characterized 
in that (a) in its native form, the polymerase comprises the amino acid sequence LSXXLX(V/I)PXXE (SEQ ID NO: 1) 
where X is any amino acid, (b) the X at position 4 in said sequence is mutated in comparison to said native sequence, 
except that X at position 4 is not mutated to E, c) the polymerase also comprises the second amino acid sequence 

w SQIXLR(V/I) (SEQ ID NO: 18) where "X" is any amino acid except E, d)the enzyme has reduced discrimination against 
incorporation of ribonucleotides labeled with fluorescein family dyes, in comparison to the corresponding wild-type 
enzyme, and may additionally include a ribonucleotide or ribonucleotide analog labeled with a negatively charged fluo- 
rescent compound or cyanine compound, dNTPs, and appropriate buffers. In a preferred embodiment, the polymerase 
contains a E681K mutation and a E61 5G mutation. Other kits for producing labeled primer extension products comprise 

is a) a ribonucleotide or ribonucleotide analog labeled with a negatively charged fluorescent compound or cyanine com- 
pound and b) a native thermostable DNA polymerase characterized in that it (i) comprises the critical motif which is the 
amino acid sequence: 

LSXXLX(V/I)PXXE (SEQ ID NO: 7) 
where X at position 4 can be any amino acid except E, (ii) comprises the second amino acid sequence SQIXLR(V/I) 

20 where "X" is any amino acid except E, and (iii) has reduced discrimination against incorporation of fluorescein-labeled 
ribonucleotides, and may additionally include dNTPs and appropriate buffers. In a preferred embodiment, in the first 
amino acid sequence, the X at position 4 is a K and in the second amino acid sequence, the X is a G. In another pre- 
ferred embodiment, in the first amino acid sequence, the X at position 4 is a R and in the second amino acid sequence, 
the X is a G. 

25 [0090] The following examples are offered by way of illustration only and are by no means intended to limit the scope 
of the claimed invention. 

Example I 

30 Expression of a Modified Taa Polymerase Gene Having Reduced Discrimination Against Nucleotides Labeled with Flu- 
orescein Family Dves 

[0091 ] The C-terminal amino acid portion of Taq DNA polymerase encodes the polymerase active site domain (Law- 
yer ef a/., 1993, PGR Methods and Applications 2:275-287, Freemont et a/., 1986, Proteins: Structure, Function and 

35 Genetics 1:66-73, which are incorporated herein by reference). A DNA fragment containing this region was isolated 
from the full-length Taq gene and mutagenized by PCR amplification in the presence of manganese (Leung etal, 1989, 
Technique 1(1): 1 1-15). For this example, all restriction enzymes were purchased from New England Biolabs, Beverly 
MA. The mutagenized fragments were digested with Pst\ and BglU and cloned into a Taq expression plasmid, pLK102, 
which had been digested with P$t\ and BgM. Plasmid pLK102 is a derivative of pLK101 in which the 900 bp Pst\-Bgl\\ 

40 fragment is replaced by a short Pst\-Bgl\\ linker. Plasmid pLK101 is a modified form of pSYC1578 (Lawyer et a/., 1993, 
supra and U.S. Patent No. 5,079,352 ), in which the small HincWIEcoPN fragment located 3' to the polymerase coding 
region was deleted. 

[0092] The resulting expression plasmids were transformed into £ coli strain N1624 (available from the £ coli 
Genetic Stock Center at Yale University, strain No. CGSC #5066) and the resulting transformants were screened for the 

45 ability to more efficiently incorporate [a- 32 P]Tet-dCTP in comparison to the wild-type enzyme. Using this procedure 
Mutant CS1, was identified as having the ability to more efficiently incorporate [a- 32 P]Tet-dCTP. The mutagenized Taq 
expression plasmid of mutant CS1 was digested with Hin6\\\/Nhe\ and the resulting restriction fragment was subcloned 
into the wild-type gene of pLK101 , replacing the unmutagenized Hind\\UNhe\ fragment, to determine which portion of 
the mutagenized Taq polymerase gene was responsible for the altered phenotype. Subclones containing the 

so tf/ndl II \/Nhe\ restriction fragment conferred the altered phenotype on the wild-type enzyme, indicating that the mutation 
was within this fragment. Subsequent subclone analysis determined that the mutation was located in the 265 bp 
BamH\-Nhe\ fragment. 

[0093] DNA sequence analysis of the 265 Nhe\-BamH\ fragment was performed on pCS1 using the TaqFS DyeDe- 
0XV TM Terminator Cycle Sequencing Kit from Applied Biosystems, Foster City, CA, and the Applied Biosystems Model 
55 Prism 377 DNA Sequencing System. The sequence analysis identified a missense mutation in the Taq polymerase 
gene at amino acid position 681 , that caused a Glutamic acid (E) residue to be replaced by a Lysine (K) residue. Num- 
bering is initiated at the codon encoding the first methionine residue of the mature protein, as in U.S. Patent No. 
5,079,352, which is herein incorporated by reference. This mutation, E681 K, specifically was caused by a GAG to AAG 
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change in the codon sequence. Plasmid pCS1 was deposited with the ATCC on August 28, 1 997, and given accession 
No. 98521. 

[0094] Plasmid pCS1 may contain additional mutations in the coding sequence for Taq polymerase; however, by fur- 
ther subcloning experiments, the E681 K mutation was determined to be solely responsible for the increased efficiency 
s in incorporation of nucleoside triphosphates labeled with fluorescein dyes. This point mutation is located in the 265 
base pair BamH\-Nhe\ DNA fragment shown in Figure 1 . Within the 265 bp DNA fragment, the E681 K mutation is the 
only change from the wild-type Taq polymerase gene sequence. 

[0095] For further analysis and quantitation of the efficiency of incorporation of nucleotide analogues, the 265 bp 
BamH\-Nhe\ fragment of plasmid pCS1 was cloned into a Taq expression vector that contained the wild-type sequence 
w within the polymerase domain, pRDA3-2. Plasmid pRDA3-2 referred to as clone 3-2, is fully described in PCT Patent 
Publication No. WO 92/06200, which is incorporated herein by reference. A second clone encoding both the E681K 
mutation as well as a F667Y mutation was created by primer-directed mutagenesis and subsequent cloning of a PCR 
product containing both mutations into the BamH\-Nhe\ sites of plasmid pRDA3-2. 

[0096] Expression vector pRDA3-2 contains the full-length Taq DNA polymerase gene operably linked to the phage 
15 lambda P L promoter. In vector pRDA3-2, the S'-nuclease domain of the Taq DNA polymerase gene contains a point 
mutation at the codon encoding glycine at position 46 that reduces S'-nuclease activity (G46D mutation). However, the 
gene sequence within the polymerase domain of the expression vector pRDA3-2 is identical to the wild-type Taq DNA 
polymerase gene sequence. Plasmids, pRDA3-2, pCS1 and the E681 K F667Y PCR product were digested with BamH\ 
and Nhe\ and the 265 bp DNA fragment from plasmid pCS1 or the PCR product was ligated into vector pRDA3-2 by 
20 conventional means. The resulting plasmids, pLK1 12 and pLK1 13, respectively, were transformed into E. coli strain 
DG1 16 (ATCC No. 53606). These plasmids encode thermostable DNA polymerases herein referred to as G46D E681 K 
Taq and G46D E681K F667Y Taq, respectively. The expressed thermostable DNA polymerase protein G46D E681K 
F667Y Taq was purified according to the method described by Lawyer etal., 1993, supra. 

[0097] The G46D E681 K Taq enzyme was purified using a similar, but smaller scale preparation method as follows: 

25 All steps were preformed at 4° C unless indicated otherwise. Cells from a 475 ml culture were resuspended in 30 ml of 
buffer (50 mM Tris-HCI, pH 7.5, 10 mM EDTA, pH 8.0, 0.5 mM Pefabloc^SC, 0.5 jig/ml leupeptin, 0.1 mM Na-p-tosyl- 
L-Lysine Chloromethyl Ketone, 1 mM DTT). Cells were sonicated at 50% duty cycle, setting 5 for 1 minute, and cooled 
on ice for 1 minute. This step was repeated twice more. Then 1.5 ml of 4.0 M ammonium sulfate was added and the 
mixture heated in a 75°C water bath for 15 minutes, followed by cooling on ice. Polyethyleneimine was added to 0.6% 

30 and the mixture was incubated on ice for 10 minutes. The mixture was centrifuged at 16,000xg for 30 minutes. The 
supernatant was loaded on a 1 .8 ml volume phenyl-sepharose column (Bio-rad Polyprep chromatography column) 
equilibrated with a solution of 50 mM Tris-HCI, pH 7.5, 10 mM EDTA, pH 8.0, 1 mM DTT, 0.2 M (NH 4 ) 2 S0 4 . The column 
was washed with 6 ml each of three solutions: 1) 25 mM Tris-HCI, pH 7.5, 1 mM EDTA, 1 mM DTT, 0.2 M (NH 4 ) 2 S0 4 , 
2) 25 mM Tris-HCI, pH 7.5, 1 mM EDTA, 1 mM DTT, and 3) 25 mM Tris-HCI, pH 7.5, 1 mM EDTA, 1 mM DTT, 20 % 

35 ethylene glycol. The polymerase was eluted with 6 ml of 25 mM Tris-HCI, pH 7.5, 1 mM EDTA, 1 mM DTT, 20% ethylene 
glycol, 2.5 M urea. After adjusting the polymerase preparation to 100 mM KCI with 3M KCI, the mixture was loaded on 
a heparin-sepharose column (1.8 ml volume, Bio-rad Poly-prep column) equilibrated in 25 mM Tris-HCI, pH 7.5, 1 mM 
EDTA, 1 mM DTT, 100 mM KCI. After a wash with the same buffer, the sample was eluted in a buffer of 25 mM Tris-HCI, 
pH 7.5, 1 mM EDTA, 1 mM DTT, 400 mM KCI. 

40 [0098] Following purification, the activity of the modified enzymes was determined by the activity assay described in 
Lawyer et a/., 1989, J. Biol. Chem. 264:6 427-6437, which is incorporated herein by reference. The activity of the puri- 
fied enzymes was calculated as follows: one unit of enzyme corresponds to 1 0 nmoles of product synthesized in 30 min. 
DNA polymerase activity is linearly proportional to enzyme concentration up to 80-10O pmoles dCMP incorporated 
(diluted enzyme at 0.024-0.03 units/^l). The purified enzymes were utilized in the incorporation and sequencing reac- 

45 tions described in Examples ll-IV. 

Ex a m p le II 

Assay to Compare Efficiency of Incorporation of ddNTPs 

50 

[0099] The relative abilities of G46D F667Y Taq, G46D F667Y E681 K Taq and F730Y 7ma30 DNA polymerases to 
incorporate a fluorescein dye family-labeled ddCTP were compared by use of a limiting template, primer extension 
competition assay. F730Y Tma30 DNA polymerase is described in Example I of U.S. Serial No 60/052065, filed July 9, 
1997 and in European Patent Application No. 98112327.6, both herein incorporated by reference. In this competition 
55 assay, because the incorporation of a ddCTP terminates the extension reaction, the more readily the polymerase incor- 
porates a ddCTP into an extended primer, the less [a- 33 P]dCTP can be incorporated. Thus, as the efficiency of ddCTP 
incorporation increases, the extent of inhibition of DNA synthesis is increased. The efficiency of incorporation of ddCTP 
is then compared to the efficiency of incorporation of f luorescently labeled ddCTP to give a relative measurement of the 
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efficiency of incorporation of f luorescently-labeled ddNTPs for a given enzyme. 

[0100] The assay was performed as previously described (Lawyer etal., 1989, J. Biol. Chem. 264 :6427) including the 
following modifications. The assay mixture was composed so the final concentration was 50 mM Bicine pH 8.3, 25° C, 
2.5 mM MgCI 2 , 1 mM p-mercaptoethanol, 20 p.M each of dATP, dGTP and dTTP (Perkin- Elmer), 20 (iM dCTP (Perkin- 

5 Elmer) and [a- 33 P]dCTP (New England Nuclear, Boston, MA). M13mp18 (Perkin-Elmer) was annealed to primer DG48, 
(SEQ ID NO: 1 0) and the equivalent of 0.085 pmoles of the annealed template was added to the assay mixture for each 
reaction. Thirty-five y\ of the assay mixture with template DNA was added to each of 380.5 ml eppendorf tubes. Dilu- 
tions of Zowie-ddCTP in 25 mM CAPSO buffer, pH 9.6 were prepared such that when 10 nl of each was added to the 
reaction tube, the final concentration of Zowie-ddCTP would be 3, 1 , 0.5, 0.25, 0.1 25, or 0.0625 pM. For G46D F667Y 

w Taq DNA polymerase, two tubes each of the 3, 1 , 0.5, 0.25, 0.125 jiM Zowie-ddCTP were prepared. For G46D F667Y 
E681K Taq and F730Y 7ma30 DNA polymerases, two tubes each of the 1, 0.5, 0.25. 0.125, and 0.0625 \iM Zowie- 
ddCTP were prepared. The eight remaining reaction tubes received 10 \i\ of 25 mM CAPSO buffer, pH 9.6. Thus, each 
of the thirty-eight tubes contained 35 pJ of assay mix and 10 jxl of either 25 mM CAPSO buffer, pH 9.6 or one of the 
Zowie-ddCTP dilutions. 

15 [0101] For each enzyme to be tested, polymerization was initiated in one tube of each Zowie-ddCTP dilution and two 
tubes containing the CAPSO buffer alone using 5 \i\ of the enzyme. The following concentrations of the enzymes were 
used, each predetermined to be an excess amount of enzyme for the amount of substrate in the assay: 2.5 units of 
F667Y G46D Taq DNA polymerase prepared as in Example I; 1.25 units of G46D, F667Y, E681K Taq DNA polymerase, 
prepared as in Example I; or 2 units of F730Y Tma30 DNA polymerase. As a control for the level of background, the 

20 remaining negative control was initiated with enzyme dilution buffer rather than polymerase. All reaction tubes were 
immediately vortexed briefly and incubated for 10 minutes at 75° C. The reactions were stopped by addition of 10 \l\ 60 
mM EDTA and stored at 0° C. 

[01 02] In an analogous experiment, ddCTP was diluted in 25 mM CAPSO buffer, pH 9.6 such that when 1 0 \i\ of each 
dilution was added to the reaction tubes, the final concentration would be 0.5, 0.25, 0.125, 0.0625, or 0.0312 pM. Ten 
25 \l\ of each dilution was pipetted into each of three 0.5 ml Eppendorf tubes containing 35 p-l of the assay mixture as 
described above. Four tubes containing 35 fil of the assay mix plus 10 nl of 25 mM CAPSO buffer, pH 9.6 were also 
prepared. Thus, each of the 19 tubes contained 35 \i\ of assay mix and 10 \i\ each of either 25 mM CAPSO, pH 9.6 or 
one of the ddCTP dilutions. 

[0103] Polymerization was initiated in one tube of each ddCTP dilution and one tube of CAPSO buffer with 2.5 units 

30 of G46D F667Y Taq DNA polymerase. 1.25 units of G46D F667Y E681K Taq DNA polymerase or 2 units of F730Y 
7ma30 DNA polymerase. The remaining tube containing CAPSO was initiated with enzyme dilution buffer rather than 
the polymerase-containing buffer as a negative control. All reactions were immediately vortexed and incubated 10 min- 
utes at 75° C. The reactions were stopped by addition of 10 microliters of 60 mM EDTA and stored at 0° C. 
[0104] For each reaction, a 50 \x\ aliquot of the 60 julI reaction was diluted with 1 ml 2 mM EDTA, 50 jig/ml sheared 

35 salmon sperm DNA as a carrier. The DNA was precipitated with TCA using standard procedures and collected on GF/C 
filter discs (Whatman). The amount of incorporated [a- 33 P]dCMP was determined for each sample and normalized to 
the CAPSO samples without ddNTP (0% inhibition). The concentration of ddCTP or Zowie-ddCTP needed for 50% inhi- 
bition was calculated for each sample and is shown in Table 2. Comparison of the amount of ddCTP needed to inhibit 
synthesis 50% with the amount of Zowie-ddCTP required to inhibit synthesis by 50% for a particular enzyme reflects 

40 the relative ability of each enzyme to incorporate f luorescently-labeled analog. These data show that G46D F667Y Taq 
DNA polymerase incorporates Zowie-ddCTP least efficiently of the three enzymes tested (ratio of concentrations for 
50% inhibition by Zowie-ddCTP vs. ddCTP = 25). F730Y 7ma30 DNA polymerase incorporates this labeled analog 
more efficiently than G46D F667Y Taq DNA polymerase (ratio of concentrations for 50% inhibition by Zowie-ddCTP vs. 
ddCTP = 4), while G46D F667Y E681K Taq DNA polymerase incorporates labeled and unlabeled ddCTP with nearly 

45 equal efficiency (ratio of concentrations for 50% inhibition by Zowie-ddCTP vs. ddCTP=1 .2). 

Table 2 



Concentration (nM) of Zowie-ddCTP or ddCTP needed for 50% inhibition 



DNA polymerase 


Zowie-ddCTP 


ddCTP 


Zowie-ddCTP/ddCTP 


G46D F667Y Taq 


1.4 


0.056 


25 


G46D F667Y E681K Taq 


0.14 


0.116 


1.2 


F730Y 7ma30 


0.236 


0.057 


4 
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Example III 
Extension Rate Assay 

s [0105] The extension rate of G46D F667Y Taq and G46D F667Y E681 K Taq were determined using an extension 
rate assay. In this experiment, the enzymes were used to extend a primer annealed to an M1 3 template in the presence 
of [a- 33 P]dCTR The extension reactions were denatured and the products analyzed by denaturing agarose gel electro- 
phoresis. 

[01 06] The assay was performed as previously described (Lawyer et al. , 1 989, 1 BioL Chem. 264:6427) including the 
10 following modifications. The assay mixture was composed so the final concentration was 50 mM Bicine pH 8.3, 25° C t 
2.5 mM MgCI 2 , 1 mM p-mercaptoethanol, 200 fiM each of dATP, dGTP and dTTP (Perkin-Elmer), 100 fiM dCTP (Per- 
kin-Elmer) containing [a- 33 P]dCTP (New England Nuclear, Boston, MA). M13mp18 (Perkin-Elmer) was annealed to 
primer DG48, (SEQ ID NO: 11), and the equivalent of 0.085 pmoles of the annealed template was added to the assay 
mixture for each reaction. Forty-five \i\ of the assay mixture with template DNA was added to each of fourteen 0.5 ml 
75 eppendorf tubes. Each tube was preincubated at 75° C for at least 30 seconds before the start of the polymerase reac- 
tion. 

[0107] Polymerization was initiated in six of the fourteen assay tubes with 5 \i\ of G46D F667Y Taq DNA polymerase 
(2.5 units) or G46D F667Y E681 K Taq DNA polymerase (1 .25 units). Both enzymes were prepared as in Example I and 
the concentration used represents a predetermined excess amount of enzyme for the amount of substrate in the assay. 

20 As a control for the level of background, the remaining negative control was initiated with enzyme dilution buffer rather 
than polymerase. All reaction tubes were immediately vortexed briefly and incubated at 75° C. Two of the six tubes con- 
taining G46D F667Y Taq DNA polymerase were incubated 3 minutes, two for 6 minutes and two for 10 minutes. Simi- 
larly, two of the tubes started with G46D F667Y E681 K Taq DNA polymerase were incubated for 30 seconds, two for 1 
minute and two for 2 minutes. The control tubes were incubated for 3 minutes. The reactions were stopped by addition 

25 of 1 0 60 mM EDTA and stored at 0° C. 

[0108] For each reaction, a 25 fil aliquot of the 60 |xl reaction was diluted with 1 ml 2 mM EDTA, 50 jig/ml sheared 
salmon sperm DNA as a carrier. The DNA was precipitated with TCA using standard procedures and collected on GF/C 
filter discs (Whatman). The amount of incorporated [a- 33 P]dCMP was determined for each sample. 
[0109] The remaining 35 \i\ of each duplicated were combined and the 70 \i\ sample was ethanol precipitated, dried 

30 and resuspended in 50 mM NaOH, 1 mM EDTA. Aliquots were removed from these samples such that an equal number 
of [a- 33 P] counts were taken from each. These aliquots were loaded on an 0.9% alkaline agarose gel, electrophoresed, 
dried and autoradiographed as previously described (Maniatis et al., 1 982, In Molecular Cloning: A Laboratory Manual, 
Cold Spring Harbor Laboratory, Cold Spring Harbor, NY). Bacteriophage lambda DNA cut with restriction enzyme 
H/ndlll (BRL) and 5' end-labeled with [ 32 P] was used as a molecular weight standard. 

35 [01 1 0] The length in base pairs of the extension product in each sample was determined by comparison of the migra- 
tion distance of each sample with the distance migrated by the lambda DNA size standard. The number of base pairs 
in each product was divided by the number of seconds each extension reaction incubated to give the extension rate as 
shown below. 

40 



DNA polymerase 


Time 


Base Pairs/Sec 


G46D F667Y Taq 


3 min. 


12.5 


G46D F667Y Taq 


6 min. 


12.2 


G46D F667Y Taq 


10 min. 


11.8 


G46D F667Y E681K Taq 


30 sec. 


36.7 


G46D F667YE681K Taq 


1 min. 


41 J 


G46D F667Y E681K Taq 


2 min 


52.9 



[0111] These results indicate that the presence of the E681 K mutation increases the extension rate of a G46D F667Y 
55 enzyme by 3- to 4.3- fold. 
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Example IV 

Cycle Sequencing with G46D F667Y E681K Taa DNA Polymerase and Fluorescein Labeled ddNTPs 

5 [0112] This example demonstrates the application of the modified polymerase of the invention to fluorescein dye 
labeled dideoxy terminator cycle sequencing, utilizing 1 fiM or less ddNTP and a ratio of ddNTP:dNTP of at least 1:100. 
The fluorescein dye labeled dideoxy terminators are reagents from the Applied Biosystems PRISM Sequenase® Ter- 
minator Sequencing Kits (Perkin -Elmer, Norwalk, CT) and were optimized for use with Sequenase DNA polymerase 
and alpha-thio dNTPs. Cycle sequencing reactions were performed in a 20 \i\ volume containing 50 mM Tris-HCI (pH 

10 8.8), 2.0 mM MgCI 2 , 100 jiM each dATP, dCTP, and dTTP (Perkin-Elmer, Norwalk, CT), 500 jiM dITP (Pharmacia Bio- 
tech, Piscataway, NJ), 0.2 \ig M13mp18 single-strand DNA template (Perkin-Elmer), 0.15 \iM LacZ Forward Primer 
(Perkin-Elmer), 5 units of G46D F667Y E681 K Taq DNA polymerase, 20 units of xTth Thermostable Pyrophosphatase 
(European Patent Application, Publication No. EP-A-763 599 and U.S. Patent No. 5,665,551), 0.05 \M Sequenase A 
Dye Terminator, 0.80 jiM Sequenase C Dye Terminator, 0.08 fiM Sequenase G Dye Terminator, and 1 .0 *iM Sequenase 

is T Dye Terminator. All four Sequenase Dye Terminators were purchased from Perkin-Elmer. Reactions were placed in a 
preheated (75° C) Perkin-Elmer GeneAmp® PCR System 9600 thermal cycler and subjected to 25 cycles of 96° C for 
10 seconds, 50° C for 5 seconds, and 60° C for 4 minutes. Dye labeled fragments were purified with Centri-Sep™ col- 
umns (Princeton Separations, Adelphia, NJ) following the manufacturer's instructions and dried in a vacuum centrifuge. 
Pellets were resuspended in 6 |il of deionized formamide:50 mg/mL Blue dextran (in 25 mM EDTA, pH 8.0) 5:1 (v/v), 

20 heated at 90° C for 3 minutes, and directly loaded onto a pre-electrophoresed 4% polyacrylamide/6 M urea gel and 
electrophoresed and analyzed on a Perkin-Elmer ABI PRISM' 377 DNA Sequencer according to the manufacturer 
instructions (ABI PRISM 377 DNA Sequencer User's Manual). Automated base-calling by the Perkin-Elmer ABI PRISM 
377 DNA Sequencer analysis software resulted in greater than 98.5% accuracy for 450 bases (6 errors for bases +10 
to +460 from primer). 

25 



30 



35 



40 



45 



50 
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SEQUENCE LISTING 



5 (1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: F .Hoffmann -La Roche Ltd 

(B) STREET: Grenzacherstrasse 124 

(C) CITY: Basel 

(D) STATE: BS 

10 (E) COUNTRY: Switzerland 

(F) POSTAL CODE (ZIP): CH-4070 

(G) TELEPHONE: (0)61 688 24 03 

(H) TELEFAX: (0)61 688 13 95 

(I) TELEX: 962292/965512 hlr ch 

15 TITLE OF INVENTION: Altered Thermostable DNA Polymerases 

for Sequencing 

(iii) NUMBER OF SEQUENCES : 18 

(iv) COMPUTER READABLE FORM: 
20 (A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1-0, Version #1.30 (EPO) 

(vi) PRIOR APPLICATION DATA: 
25 (A) APPLICATION NUMBER: US 60/052,065 

(B) FILING DATE: 09-JUL-1997 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME /KEY : Region 

(B) LOCATION: 7 

(D) OTHER INFORMATION: / label = Xaa 
/note= "Xaa at position 7 is Val or He." 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

Leu Ser Xaa Xaa Leu Xaa Xaa Pro Xaa Xaa Glu 
45 1 5 10 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 
50 (B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 



10 



15 



20 



30 



35 



(ix) FEATURE: 

(A) NAME/KEY: Region 

(B) LOCATION: 3.. 6 

(D) OTHER INFORMATION : /label= Xaa ^ 
/note= B Xaa at position 3 is Gin or Gly. Xaa at position 6 is 
Ser or Ala. • 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Leu Ser Xaa Xaa Leu Xaa He Pro Tyr Glu Glu 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Leu Ser Gin Xaa Leu Ala He Pro Tyr Glu Glu 
1 5 10 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME/ KEY: Region 

(B) LOCATION: 7 

40 (D) OTHER INFORMATION: / label = Xaa 

/note= "Xaa at position 7 is Val or He." 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

45 Leu Ser Val Xaa Leu Gly Xaa Pro Val Lys Glu 

1 5 10 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 
so (A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: peptide 



5 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Leu Ser Val Xaa Leu Gly Val Pro Val Lys Glu 
1 5 10 

10 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

T5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Leu Ser Val Xaa Leu Gly He Pro Val Lys Glu 
1 5 10 

25 (2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
3o (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME/ KEY : Region 
35 (B) LOCATION: 4.. 7 

(D) OTHER INFORMATION: /label= Xaa 
/note* "Xaa at position 4 is any amino acid but not a glutamic 
acid residue. Xaa at position 7 is Val or He." 



40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Leu Ser Xaa Xaa Leu Xaa Xaa Pro Xaa Xaa Glu 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 8: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



50 



(ii) MOLECULE TYPE: peptide 



(ix) FEATURE : 
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{A) NAME/ KEY : Region 
(B) LOCATION: 4 

(D) OTHER INFORMATION: /label= Xaa 
/note= "Xaa at position 4 is any amino acid except 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Leu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa. Xaa Glu 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME /KEY : Region 

(B) LOCATION: 2.-5 

(D) OTHER INFORMATION: /label= Xaa 
/note= "Xaa at position 2 is Ser or Ala, Xaa at Position 
any amino acid except Glu and Xaa at Position 5 is Leu o: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Leu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Glu 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids- 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME /KEY : Region 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /label= Xaa 
/note= "Xaa at positioin 4 is any amino acid except Glu." 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Leu Ser Xaa Xaa Leu Xaa Xaa Xaa Xaa Xaa Glu 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GGGAAGGGCG ATCGGTGCGG GCCTCTTCGC 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Arg Arg Xaa Xaa Lys Xaa Xaa Asn Tyr Xaa Xaa Xaa Tyr Gly 
1 5 10 l=> 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(ix) FEATURE: 

(A) NAME /KEY : Region 

(B) LOCATION: 3. .6 

(D) OTHER INFORMATION: /label= Xaa , 
/note- "Xaa at position 3 is Gin or Gly. Xaa at position 4 is 
any amino acid except Glu. Xaa at position 6 is Ser or Ala. 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Leu Ser Xaa Xaa Leu Xaa He Pro Tyr Glu Glu 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(ix) FEATURE: 
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(A) NAME/ KEY : Region 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /label= Xaa 
/note= "Xaa at position 4 is any amino acid except Glu." 



10 



15 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Leu Ser Gin Xaa Leu Ala He Pro Tyr Glu Glu 
15 10 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



<ix) FEATURE: 
20 (A) NAME/ KEY: Region 

(B) LOCATION: 4. .7 

(0) OTHER INFORMATION: /Iabel= Xaa 
/note* "Xaa at position 4 is any amino acid except Glu. Xaa at 
position 7 is Val of He." 



25 



30 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 

Leu Ser Val Xaa Leu Gly Xaa Pro Val Lys Glu 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 
(B> TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME/ KEY: Region 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /label= Xaa 
/note- "Xaa at position 4 is any amino acid except Glu." 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

45 Leu S er Val Xaa Leu Gly Val Pro Val Lys Glu 

1 5 10 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 
so (A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



55 



26 



EP 0 902 035 A2 



(ii) MOLECULE TYPE: peptide 

(ix) FEATURE: 

(A) NAME /KEY: Region 
<B) LOCATION: 4 

(D) OTHER INFORMATION: /label* Xaa 
/note= "Xaa at position 4 is any amino acid except Glu. n 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Leu Ser Val Xaa Leu Gly lie Pro Val Lys Glu 
15 10 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(ix) FEATURE: 

(A) NAME /KEY: Region 

(B) LOCATION: 4. .7 

(D) OTHER INFORMATION: /label= Xaa 
/note= "Xaa at position 4 is any amino acid except Glu. Xaa at 
position 7 is Val or lie." 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Ser Gin lie Xaa Leu Arg Xaa 
1 5 



Claims 

1 . A recombinant thermostable DNA polymerase which is characterized in that 

a) in its native form said polymerase comprises the amino acid sequence LeuSerXaaXaaLeuXaaXaaProXaaX- 
aaGlu (SEQ ID NO: 1), whereby "Xaa" at positions 3, 4, 6, 9, and 10 of said sequence are any amino acid res- 
idue, and "Xaa" at position 7 of said sequence is Val or He; 

b) said "Xaa" at position 4 is mutated in comparison to said native sequence, except that "Xaa" at position 4 is 
not mutated to Glu; and 

c) said thermostable DNA polymerase has a level of discrimination against incorporation of nucleotides labeled 
with fluorescein family dyes which is reduced in comparison to the native form of said polymerase. 

2. The recombinant thermostable DNA polymerase of claim 1 wherein said nucleotide is a dideoxynucleotide and said 
level of discrimination is at least 3-fold lower than that of said native form of said polymerase. 

3. The recombinant thermostable DNA polymerase of claim 2 wherein said level of discrimination is measured by 
determining the concentration of a dideoxynucleotide labeled with a fluorescein dye that is required for 50% inhibi- 
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tion of DNA synthesis 

4. The recombinant thermostable DNA polymerase of claim 2 wherein said polymerase is from a thermophilic species 
selected from the group consisting of Thermosipho africanus, Bacillus caldotenax, and Bacillus 

s stearothermophilus . 

5. The recombinant thermostable DNA polymerase of claim 2 wherein said polymerase is from a Thermus species. 

6. The recombinant thermostable DNA polymerase of claim 5 which is characterized in that 

10 

a) in its native form said polymerase comprises the amino acid sequence LeuSerXaaXaaLeuXaalleProTyr- 
GluGlu (SEQ ID NO: 2), whereby "Xaa" at position 3 is Gin or Gly, "Xaa" at position 4 is any amino acid, and 
"Xaa" at position 6 is Ser or Ala 

b) said "Xaa" at position 4 is mutated in comparison to said native sequence, except that "Xaa" at position 4 is 
is not mutated to Glu; and 

c) said thermostable DNA polymerase has a level of discrimination against incorporation of nucleotides labeled 
with fluorescein family dyes which is reduced in comparison to the native form of said polymerase. 

7. The recombinant thermostable DNA polymerase of claim 6 which is characterized in that 

20 

a) in its native form said polymerase comprises the amino acid sequence LeuSerGlnXaaLeuAlalleProTyr- 
GluGlu (SEQ ID NO:3), whereby "Xaa" at position 4 is any amino acid 

b) said "Xaa" at position 4 is mutated in comparison to said native sequence, except that "Xaa" at position 4 is 
not mutated to Glu; and 

25 c) said thermostable DNA polymerase has a level of discrimination against incorporation of nucleotides labeled 

with fluorescein family dyes which is reduced in comparison to the native form of said polymerase. 

8. The recombinant thermostable DNA polymerase of claim 7 which is characterized in that said "Xaa" at position 4 
is mutated to Lys. 

30 

9. The recombinant thermostable DNA polymerase of claim 2 wherein said polymerase is from a Thermotoga spe- 
cies. 

10. The recombinant thermostable DNA polymerase of claim 9 which is characterized in that 

35 

a) in its native form said polymerase comprises the amino acid sequence LeuSerValXaaLeuGlyXaaProVal- 
LysGlu (SEQ ID NO: 4), whereby "Xaa" at position 4 is any amino acid, preferably Arg, and "Xaa" at position 7 
is Val or lie. 

b) said "Xaa" at position 4 is mutated in comparison to said native sequence, except that "Xaa" at position 4 is 
40 not mutated to Glu; and 

c) said thermostable DNA polymerase has a level of discrimination against incorporation of nucleotides labeled 
with fluorescein family dyes which is reduced in comparison to the native form of said polymerase. 

11. A nucleic acid sequence encoding a recombinant thermostable DNA polymerase as claimed in any one of claims 
45 1to10. 

12. A vector comprising a nucleic acid sequence encoding a recombinant thermostable DNA polymerase as claimed 
in any one of claims 1 to 10. 

so 1 3. A host cell comprising a nucleic acid sequence encoding a recombinant thermostable DNA polymerase as claimed 
in any one of claims 1 to 10. 

14. A method for preparing a recombinant thermostable DNA polymerase as claimed in any one of claims 1 to 10, said 
method comprises: 

55 

(a) culturing a host cell comprising a nucleic acid sequence encoding a recombinant thermostable DNA 
polymerase as claimed in any one of claims 1 to 10 under conditions which promote the expression of the 
recombinant thermostable DNA polymerase; and 
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(b) isolating the recombinant thermostable DNA polymerase from the host cell or from the medium. 
15. A recombinant thermostable DNA polymerase prepared by the method as claimed in claim 14. 
5 16. A method of DNA sequencing which method comprises: 

a) providing a recombinant thermostable DNA polymerase as claimed in any one of claims 1 to 10; 

b) providing a dye-terminator labeled with a negatively charged fluorescent dye; and 

c) performing a dye-terminator sequencing reaction. 

10 

17. The method of claim 16 wherein the recombinant thermostable DNA polymerase has a reduced level of discrimi- 
nation against incorporation of nucleotides labeled with fluorescein family dyes and wherein the said nucleotide is 
a dideoxynucleotide and said level of discrimination is measured by determining the ratio of the concentration of a 
dideoxynucleotide labeled with a fluorescein dye required for 50% inhibition of DNA synthesis versus the concen- 

15 tration of an unlabeled dideoxynucleotide required for 50% inhibition. 

18. The method of claim 17 wherein said ratio is 4 or less. 

19. A method of producing labeled DNA which method comprises: 

20 

a) providing a recombinant thermostable DNA polymerase as claimed in any one of claims 1 to 10; 

b) providing a nucleotide labeled with a fluorescein family dye; and 

c) performing a DNA synthesis reaction. 

25 20. A method of producing labeled primer extension products which method comprises: 

a) providing a recombinant thermostable DNA polymerase as claimed in any one of claims 1 to 10; 

i) said polymerase comprises the amino acid sequence LeuSerValXaaLeuGlyXaaProValLysGlu (SEQ ID 
30 NO: 4), whereby "Xaa M at position 4 can be any amino acid except Glu, and "Xaa" at position 7 of this 

sequence is Val or lie, 

ii) said polymerase has a reduced level of discrimination against incorporation of nucleotides labeled with 
fluorescein family dyes; 

iii) said polymerase also comprises the second amino acid sequence SQIXLR(WI) (SEQ ID NO: 1 8) where 
35 "X" is any amino acid except E, 

iv) said polymerase has reduced discrimination against incorporation of ribonucleotides labeled with fluo- 
rescein family dyes; 

b) providing a ribonucleotide labeled with a fluorescein family dye, and 
40 c) performing a primer extension reaction. 

21 . The use of a recombinant thermostable DNA polymerase as claimed in any one of claims 1 to 10 in a in vitro DNA 
synthesis application, such as DNA sequencing, synthesis of labeled DNA and the production of labeled primer 
extension products. 

45 

22. A kit for DNA sequencing which comprises a thermostable DNA polymerase as claimed in any one of claims 1 to 
10 and a terminator labeled with negatively-charged fluorescent dye. 

23. The kit of claim 22 wherein the said recombinant thermostable DNA polymerase has reduced discrimination 
so against incorporation of nucleotides labeled with fluorescein family dyes and wherein said reduced level of discrim- 
ination is measured by determining the ratio of the concentration of ddNTP labeled with a fluorescein family dye 
required for 50% inhibition of DNA synthesis compared to that for an unlabeled ddNTP and said ratio is 4 or less. 

24. The kit of daim 23 wherein said level of discrimination is at least 5-fold lower than that of said native form of said 
55 polymerase. 

25. A kit for producing labeled DNA which comprises a recombinant thermostable DNA polymerase as claimed in any 
one of claims 1 to 10 and a nucleotide labeled with a negatively-charged fluorescent dye. 
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26. A kit for producing labeled primer extension products which comprises a recombinant thermostable DNA polymer- 
ase as claimed in any one of claims 1 to 10 and a ribonucleotide labeled with a fluorescein family dye. 
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Fig. 1 
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Taq DNA Polymerase 
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